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\ Abstract 

-H 

An  algorithm  for  constructing  and  adaptively  maintaining 
routing  tables  in  communication  networks  is  presented,  The  algor- 
ithm can  be  employed  in  store-and- forward  as  veil  as  line  switching 
networks,  uses  distributed  computation,  provides  routing  tables 
that  are  loop- free  for  each  destination  at  aid  times,  adapts  to 
changes  in  network  flow3  and  is  completely  failsafe.  The  latter 
means  that  after  arbitrary  failures  and  additions,  the  network 


recovers  in  finite  time  in  the  sense  of  providing  routing  paths 


between  all  physically  connected  nodes.  Complete  rigorous  proofs 


of  all  these  properties  are  provided. 
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1.  INTBODUCTI ON 

Reliability  and  the  ability  to  recover  from  topological  changes  are 
properties  of  utmost  importance  for  smooth  operation  of  data-communication 
networks.  In  today's  data  networks  it  happens  occasionally*  more  or  less 
often  depending  on  the  quality  of  the  individual  devices,  that  nodes  and 
communication  links  fail  and  recover;  also  new  nodes  or  links  become  opera- 
tional and  have  to  be  added  to  an  already  operating  network.  The  reliability 
of  a computer-communication  network,  in  the  eyes  of  its  users,  depends  on  its 
ability  to  cope  with  these  changes,  meaning  that  no  breakdown  of  the  entire 
network  or  of  large  portions  of  it  will  be  triggered  by  such  changes  and  that 
in  finite  - and  hopefully  short  - time  after  their  occurrence,  the  remaining 
network  will  be  able  to  operate  normally.  Unfortunately,  recovery  of  the 
network  under  arbitrary  number,  timing,  and  location  of  topological  changes 
is  hard  to  insure  and  little  successful  analytical  work  has  been  done  in 
this  direction  so  far. 

The  above  reliability  and  recovery  problems  are  difficult  whether 
one  uses  centralized  or  distributed  rout  * control.  With  centralized  rout- 
ing, one  lias  the  problem  of  central  node  failure  plus  the  chicken  and  egg 
problem  of  needing  routes  to  obtain  the  network  information  required  to 
establish  routes.  Our  primary  concern  here  is  with  distributed  routing  that 
recovers  from  topological  changes;  here  one  has  the  problems  of  asynchronous 
computation  of  distributed  status  information  and  of  designing  algorithms 
which  adapt  to  arbitrary  changes  in  network  topology  in  the  absence  of  global 
knowledge  of  topology. 

The  paper  presents  a distributed  protocol  that  maintains  a route 
from  any  source  to  any  destination  in  a network.  The  protocol  is  distributed 
in  the  sense  that  no  central  tables  are  required  and  there  is  no  global 
knowledge  of  the  routes,  i.e,  eaca  node  knows  only  who  is  the  next  code 
(called  the  "preferred  neighbor" ) on  the  route  to  a given  destination.  Each 
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node  is  responsible  for  updating  its  own  tables  (e.g.  choosing  a new  pre- 
ferred neighbor)  and  these  updates  are  coordinated  by  the  protocol  via 
control  messages  sent  between  adjacent  nodes.  For  a given  destination,  the 
set  of  routes  maintained  by  the  protocol  are  loop- free  at  all  times,  and 
whenever  no  failures  occur,  they  form  a spanning  tree  rooted  at  the  destina- 
tion (i.e.  a tree  that  covers  all  nodes). 

To  each  link  in  the  network,  a strictly  positive  "distance"  (or 
"weight")  is  assigned  which  represents  the  cost  of  using  the  link.  Accord- 
ing to  utilization  and  porsibly  other  factors,  -his'  distance  may  vary  with 
time  following  long-term  trends.  The  length  of  any  path  is  the  sum  of  the 
distances  on  the  links  of  this  path.  Destinations  nay  asynchronously  trigger 
the  protocol  and  start  update  cycles  to  change  routes  according  to  new  dis- 
tances. Such  a cycle  first  propagates  uptree  while  modifying  the  distance 
estimates  from  nodes  to  the  destination  and  then  propagates  downtree  while 
updating  the  preferred  neighbors . Each  cycle  tends  to  find  routes  with 
short  paths  from  each  node  to  the  destination,  and  assuming  time -invariance 
of  link  weights,  the  strict  minimum  (i.e.  shortest  paths)  will  be  reached 
within  a finite  number  of  update  iterations. 

The  proposed  protocol  also  provides  for  recovery  of  routes  after 
failures  and  for  additions  of  links  or  nodes  to  the  network.  When  a link 
fails,  appropriate  information  is  propagated  backwards  in  the  network  and, 
in  addition,  a "request"  message  is  generated  and  forwarded  towards  the 
destination.  New  links  are  brought  up  via  a similar  protocol.  The  request 
message  triggers  an  update  cycle  and  it  is  guaranteed  that  within  finite 
time , all  nodes  physically  connected  to  each  destination  will  have  a loop- 
free  route  to  it.  This  holds  also  for  multiple  topological  changes,  and 
even  if  such  changes  occur  while  the  protocol  is  active  and  the  update  is 
in  progress . The  recoverability  of  the  protocol  is  achieved  without  employ- 
ing any  time-out  in  its  operation,  a feature  which  greatly  enhances  its  . 
amenability  to  analysis  and  facilitates  structured  implementation. 
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The  protocol  is  mainly  intended  for  que si-static  routing  in  communi- 
cation networks  and  the  routes  provided  by  the  protocol  can  be  used  in  a 
variety  of  way3  for  actual  routing  of  information.  Although  specification  of 
information  routing  algorithms  is  outside  the  scope  of  the  present  paper, 
we  indicate  here  a few  applications.  In  a (physical  or  virtual)  line- 
switched  network,  it  is  often  impractical  to  reroute  already  established 
conversations , except  in  case  of  disruption  caused  by  failure  or  priority 
preemption.  In  this  case,  the  routes  provided  by  the  present  protocol  may 
be  used  for  assigning  paths  to  new  or  disrupted  calls.  For  example j in  a 
virtual  line-switched  network  the  link  weights  may  represent  link  delays, 
and  then  the  path  provided  by  our  protocol  in  steady  state  will  give  the 
minimum  delay  route  for  the  new  call . If  the  weights  represent  incremental 
delay,  then  the  path  will  minimize  network  average  delay  (see  [l,  eq.  (25)]). 
Othe1'-  criteria  like  probability  of  blocking,  can  also  be  taken  into  con- 
sideration in  the  link  weight . Observe  that  if  the  link  weights  change 
drastically,  the  above  strategy  may  allow  new  conversations  to  follow  paths 
so  different  from  the  old  ones  that  together  they  form  a loop,  but  this  is 
still  the  best  one  can  do  under  the  constraint  that  established  conversa- 
tions cannot  be  rerouted. 

Similar  strategies  can  be  used  in  networks  using  message  switching, 
where  the  preferred  neighbor  indicates  the  first  hop  of  the  present  best 
estimated  route  towards  the  SINK  and  the  node  nay  for  example  increase  the 
fraction  of  messages  routed  over  this  path  while  reducing  the  fraction 
sent  over  other  routes.  More  sophisticated  failsafe  routing  and  update 
procedures,  where  exact  amount  of  increase  and  reduction  of  traffic 
fractions  are  indicated  so  that  optimality  and  routing  loop-freedom  are 
achieved,  have  been  obtained  using  ideas  similar  to  the  protocol  of  this 
paper  and  are  presented  in  a subsequent  report  [2]. 
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Finally,  we  may  mention  that  the  present  protocol  can  replace  the 
simple-minded  saturation  routing  that  is  presently  used  in  several  networks 
to  locate  mobile  subscribers  and  to  s lect  routing  paths  [3].  The  protocol 
of  this  paper  has  all  the  advantages  indicated  in  [3,  Sec.  II]  for  satura- 
tion routing,  but  requires  no  time-out  and  provides  a route  selected  not 
only  on  the  basis  of  the  instantaneous  congestion  but  on  averaged  quantities . 

This  work  was  inspired  by  [4]  and  [ 5 1 » where  minimum  delay  routing 
algorithms  using  distributed  computation  were  developed.  These  algorithms 
also  maintain  a per  destination  loop-free  rouuimg  at  each  step.  One  of  the 
main  contributions  of  the  protocol  given  in  ohe  present  paper  is  to  intro- 
duce features  insuring  recovery  of  the  routes  from  arbitrary  topological 
changes  of  the  network.  As  a result,  the  protocol  of  the  present  paper  is, 
to  our  knowledge,  the  first  one  that  is  distributed  and  for  which  all  the 
following  properties  are  rigorously  proved: 

(a)  Loop-freedom  for  routes  to  each  destination,  at  all  times. 

(b)  Independently  of  the  sequence,  location  and  quantity  of  topolog  "al 
changes,  the  routes  recover  in  finite  time. 

(c)  Under  stationary  conditions,  the  routes  converge  to  paths  with 
minimal  weighted  length. 

Several,  routing  algorithms  possessing  seme  of  the  propert  bed 

above  have  been  previously  indicated  in  the  Literature.  In  [6],  a routing 
algorithm  similar  to  the  one  used  in  the  ARPA  network  [7]  but  with  unity 
link  weights,  is  presented.  It  is  shown  there,  that  at  the  time  the  algorithm 
terminates,  the  resulting  routing  procedure  is  loop-free  and  provides  the 
shortest  paths  to  each  destination.  As  with  the  ARPA  routing,  however,  the 
algorithm  allows  temporary  loops  to  be  formed  during  the  evolution  of  the 
algorithm.  The  algorithm  proposed  in  [8]  ensures  loop-free  routing  for 
individxial  messages.  Th.i3  property  is  achieved  by  requesting  each  node  to 
send  a probing  message  to  the  destination  before  each  individual  rerouting; 


the  node  is  allowed  to  indeed  perforin  the  rerouting  only  after  having  received 
an  acknowledgement  from  the  destination.  * The  extra  load  on  the  network  by 
sending  probing  messages  from  each  node  to  each  destination  for  each  rerouting 
is  clearly  extremely  large.  Also  loop  freedom  for  individual  messages  is  a 
weaker  property  than  loop  freedom  for  each  destination.  For  example,  in  a 
three-node  network,  sending  traffic  from  node  3 to  node  1 via  node  2 and  send- 
ing traffic  from  node  2 to  node  1 via  node  3 would  be  loopfree  for  individual 
messages,  but  not  loopfree  for  each  destination.  See  [9]  for  a more  complete 
discussion  of  loop  freedom. 

In  addition  to  the  introduction  of  this  particular  protocol  and  the 
proofs  of  its  main  properties,  the  paper  provides  contributions  in  the 
direction  of  modeling,  analysis  and  validation  of  distributed  algorithms. 

The  operations  required  by  the  algorithm  at  each  node  are  summarized  as  a 
finite-state  machine,  with  transitions  between  states  triggered  by  the 
arrival  of  special  control  messages  from  the  neighbors,  and  the  execution 
of  a transition  may  result  in  the  transmission  of  such  messages.  Methods 
for  modeling  and  validation  of  various  communication  protocols  were  proposed 
in  [10]  - [13].  These  methods  are  designed  however  to  handle  protocols  in- 
volving either  only  two  communicating  entities  or  nodes  connected  by  a fixed 
topology.  The  model  we  use  to  describe  our  algorithm  is  a combination  of 
these  known  models,  but  is  extended  to  allow  us  to  study  a fairly  complex 
distributed  protocol.  Th  analysis  and  validation  of  the  algorithm  is  per- 
formed by  using  a special  type  of  induction  that  allows  us  to  prove  global 
properties  while  essentially  looking  at  local  events. 

— Before  proceeding,  we  may  mention  two  other  distributed  protocols 
that  were  recently  developed.  In  [lU],  an  algorithm  for  network  resynchroni- 
zation is  presented  and  its  recovery  properties  are  proved  under  arbitrary 
topological  changes.  A similar  goal  is  obtained  by  R.G.  Gallager  in  an 
unpublished  work  [15],  while  also  determining  the  paths  with  minimum  number 
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of  links .between  each  pair  of  nodes  in  the  network.  Although  there  is  a 
great  similarity  between  the  ways  in  which  the  updating  information  .propa- 
gates and  the  distributed  computation  is  performed  by  the  algorithms  of 
[lM»  [15]  and  of  the  present  paper,  the  exact  relationship  between  these 
protocols  is  a subject  for  future  research. 
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2.  THE  PROTOCOL 

To  facilitate  understanding,  we  describe  the  protocol  in  several 
steps.  We  first  present  the  "basic  protocol",  i.e.  assuming  that  no  topo- 
logical changes  occur.  Then  we  describe  the  additions  to  the  protocol  in 
case  of  link  outage  and  finally  the  additions  for  links  becoming  operational . 
A node  outage  can  be  represented  as  the  outage  of  all  of  its  links  and 
similarly,  a node  becoming  operational  can  be  represented  as  links  becoming 
operational.  Therefore,  we  do  not  pay  special  attention  to  topological 
changes  caused  by  nodes. 

The  following  comments  apply  to  the  rest  of  the  paper: 

1.  Since  we  are  not  concerned  with  data  transfer,  we  use  the  term 
"message"  to  mean  the  special  control  messages  employed  by  the 
protocol. 

2.  We  assume  that  messages  sent  by  a node  to  a neighbor  are  received 
in  the  same  order  that  they  are  sent,  i.e.  FIFO  is  preserved  in  the 

.links  (and  local  protocols). 

3.  The  protocol  proceeds  independently  for  each  destination.  Conse- 
quently, for  the  rest,  of  the  paper  ve  fix  the  destination  and 
present  and  analyse  the  protocol  for  that  given  destination,  which 
is  denoted  by  SINK. 

2.1  The  Basic  Protocol 

As  already  mentioned,  each  node  i in  the  network  hats  at  any  time 
a preferred  neighbor.  Thus,  we  assume  than  each  node  has  a variable 
which  points  to  that  neighbor.  For  the  basic  protocol.  We  assume  that  after 
initialization,  the  directed  graph  defined  by  the  nodes  ft1  and  arcs  (i,P^) 
form  a tree  directed  towards  (and  therefore  rooted  at)  the^SINK,  as  exempli- 
fied by  the  network  of  Fig.  1 where  directed  arcs  denote  the  preferred 
neighbors  {p^}.  Subsequent  sections  describing  the  protocol  Which  handles 
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topological  changes  will  show  that  this  assumption  is  justified,  by  the 

jjfc. 

initialization  procedure.  Each  node  i ' has  also  a positive  variable  d,. 
maintained  by  the  protocol, denoting  an  »g-t--yted  distance  from  i to  the 

SIVK  (dgjjjx  is  by  def initio*"  equal  to  0).  During  ar.  update,  the  protocol 

* 

reevaluates  the  distances  {d^}  8X1(1  accordingly  the  mode  3 choose  preferred 

neighbors  {p^}  in  such  a way  that  the  directed  graph  given  by  the  arcs 
(i,p^)  remains  at  all  times  a tree  rooted  at  the  SINK. 

As  already  mentioned  in  Section  1,  to  each  link  (i,l)  a strictly 

positive  "distance",  denoted  by  d . , is  assigned.  We  assume  all  links  to 

i ** 

be  full  duplex  and  allow  a link  to  have  a different  distance  in  each  direc- 
tion. The  distance  d^  is  allowed  to  vary  with  time  and  needs  to  he 
known  (measured  or  estimated)  only  by  node  i.  The  protocol  tends  to  mini- 
mize the  distance  d^.  from  each  node  i to  the  SINK,  where  thi3  distance 

is  given  by  the  sum  of  the  weights  d,  on  t,he  directed  path  from  a node 

tin 

to  the  SINK. 

Ab  described  below,  the  SINK  may  asynchronously  start  update  cycles 
to  change  routes  according  to  new  distances.  Such  a cycle  first  modifies 
distance  estimates  {d^}  uptree  and  then  modifies  preferred  neighbors  {p^} 
downtree.  An  update  cycle  is  started  by  the  SINK  by  sending  a message 
NBG(dSIN^)  to  each  of  its  neighbors  (notice  that  MSG(dgjj^)  * MSG(O)  by 
definition).  When  a node,  say  i,  receives  a message  from  its  p^,  it 
reevaluates  its  estimated  distance  d^  and  transmits  MSG(d^)  to  each  of 
its  neighbours  except  p^ . Notice  that  the  spanning  tree  structure  mentioned 
before  (Fig.  l)  guarantees  that  after  the  SINK  has  started  the  updating 
cycle,  each  of  the  network  nodes  will  eventually  perform  this  step.  Further- 
more, this  is  done  in  the  order  given  by  the  tree  from  the  SINK  towards  the 


leaves . 
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Whenever  a node  i receives  a message  MSG(d)  from  a neighbor  l , 
it  estimates  and  stores  its  distance  through  this  neighbor  to  the  SINK. 

This  distance  is  estimated  as  d + d^ . As  said  before,  the  reevaluation  of 
the  estimated  distance  d^  is  performed  when  receiving  MSG  from  the  pre- 
ferred neighbor  p^.  Node  i calculates  then  the  minimum  of  the  estimated 
distances  to  the  SITK  through  all.  those  neighbors  from  which  it  has  already 
received  MSG  (during  the  present  update  cycle).  The  node  sets  then  d^  as 
this  minimum.  (Notice  that  d^  is  only  an  "estimate"  of  the  minimal  dist- 
ance to  the  SINK  because  it  is  sometimes  calculated  based  upon  part  of  the 
neighbors  of  i.) 

When  a node,  say  i,  has  received  MSG(d)  from  all  of  its  neighbors, 
it  transmits  MSG-(d^)  to  it3  p^  and  then  determines  its  new  preferred 
neighbor  . This  is  done  by  choosing  as  the  neighbor  which  provides 

minimal  estimated  distance  from  i to  the  SINK.  This  choice  i3  made  among 
all  neighbors  of  1 and  as  such  it  may  pick  a neighbor  different  from  the 
one  which  provided  (the  calculation  of  the  estimated  distance  is 

usually  based  upon  part  of  the  neighbors).  Since,  as  previously  shown,  each 
node  i vill  eventually  send  MSG(di)  to  all  its  neighbors  except  pi» 
the  leaves  of  the  directed  tree  will  eventually  receive  MSG  from  all  their 
neighbors . Thus  they  will  send  MSG  to  their  preferred  neighbor  p^^  and 
reevaluate  a new  . It  can  be  easily  seen  by  induction,  that  each  node 
will  perform  this  step.  This  happens  in  the  order  given  by  the  original 
directed  tree,  from  the  leaves  towards  the  SINK. 

Since  the  SINK  denotes  the  destination,  the  SINK  has  no  preferred 
neighbor,  and  therefore  the  SINK  does  not  update  vhen  it  receives 

MSG(d)  from  all  its  neighbors.  Instead,  this  event  notifies  the  SINK  that 
the  update  cycle  has  been  properly  completed.  The  SINK  is  not  allowed  to 


start  a new  update  cycle  until  the  previous  cycle  has  been  properly  completed. 
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A node  i always  updates  its  preferred  neighbor  to  point 

towards  a node  J having  estimated  distance  d < d . As  proved  in 
Section  3,  this  fact  insures  that  the  updated  directed  graph  will  remain 
a tree  at  any  time. 

The  basic  protocol  can  be  formally  defined  by  the  basic  algorithm 
performed  by  each  node  i.  The  latter  is  shown  in  Table  1 with  the  aid  of 
a Finite  State  Machine.  Node  i can  be  in  either  of  two  states.  It  will 
be  in  state  S2  after  having  received  MSG  from  its  preferred  neighbor 
and  until  it  receives  messages  from  all  its  neighbors.  Otherwise  node  i 
will  be  in  SI.  The  variables  D^(i),  one  for  each  neighbor  2.  of  i, 
store  the  values  of  the  estimated  distance  through  link  l to  the  SINK. 

The  variables  N,(£),  one  for  each  neighbor  l of  i,  denote  flags  which 
can  take  the  value  "RCVD"  to  mean  that  MSG(d)  was  received  from  link  (i,£) 
during  the  current  cycle,  or  the  value  "nil"  otherwise.  CT  is  a control 
flag  which  can  take  over  the  values  0 or  1 . We  assume  that  when  MSG(d) 
arrives  from  link  l,  it  is  given  to  the  algorithm  in  the  format  MSG(d,Jt). 

When  MSG(d,t)  is  processed,  the  flag  If(d)  is  set  to  RCVD, 

D^(t)  is  calcu^^ed,  CT  is  set  to  0,  and  then  the  Finite  State  Machine 

executes  transition  untj^L  no  more  transitions  are  possible.  Transition  T12 

# 

can  be  executed  if  node  i is  ^ St^ite  SI  and  Condition  12  is  satisfied, 
i.e.  the  algorithm  is  processing  a Mst)(  d , % ) which  l a p^  and  CT  = 0. 

If  T12  is  executed,  then  node  i goes  to  state  S2  ancLAction  12  is  performed, 

i.e,  the  estimated  distance  is  reevaluated  and  MSG(d^)  is  transmitted  to 

m # 

each  neighbor  of  i except  the  preferred  neighbor  p . In  a sirila^way, 

T21  is  executed  when  node  i is  in  state  S2  and  Condition  21  is  satisfiel,  * 
in  which  case  node  i goes  to  state  SI  and  Action  21  is  performed.  Tne  role 
of  CT  is  to  insure  that  T12  cannot  be  executed  icmediatly  after  T21  (for 
example,  suppose  node  1 is  in  state  SI  and  MSG(d,i  * p^ ) arrives  after 


messages  have  arrived  for  all  other  links  of  i . In  this  case,  without  CT, 
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the  sequence  of  transitions  T12,  T21,  T12  will  be  performed  in  contradiction 
with  the  protocol).  Notice  that  the  sequence  T12,  T21  is  permitted  - 


The  use  of'  the  Finite  State  Machine  for  describing  the  relatively 
simple  basic  algorithm  may  appear  superfluous.  Its  importance  will  become 
apparent  when  describing  the  more  complex  protocols  and  the  proofs  of  their 

3 i. 

properties. 


2.2  Handling  Failures  of  Links 


At  our  level  of  abstraction,  the  outage  of  a link  is  called  "link 
failure".  Transient  (or  transmission)  failures  can  be  masked  out  by  the 
link  protocol,  and  we  are  not  concerned  with  them.  If  a link  of  the 
directed  tree  fails,  then  all  the  nodes  which  are  predecessors  of  this 
link  on  the  directed  tree  lose  their  route  to  the  SINK,  but  they  are  unaware 
of  this  fact  at  the  time  of  the  failure.  For  exsnpie,  if  link  (7,8)  of 

Fig.  1 fails,  nodes  6,  7 and  9 lose  their  route.  Furthermore,  if  an  update 

cycle  is  started,  node  7 will  not  be  able  to  receive  MSG(d,£  = 8)  and  there- 
fore node  7,  as  well  as  nodes  6 and  9 will  not  be  able  to  perform  T12.  In 
such  a case  we  would  like  to  recover  by  finding  an  alternative  route  (e.g. 
through  node  5)*  but  since  the  basic  protocol  allows  changing  estimated 
distanc.e  d^  and  preferred  neighbor  p^  only  after  performing  T12,  there 
is  need  to  provide  an  extension  to  handle  this  situation.  Two  actions  must 

be  taken  by  the  extended  protocol.  First  to  inform  nodes  7*  6 and  9 not  to 

wait  for  triggering  messages  from  the  tree  (and  also  -chat  the  existing  tree 
has  no  meaning  for  them  anymore)  and  second,  to  allow  those  nodes  to  choose 

their  p.  whenever  control  messages  from  new  cycles  arrive.  These  features 
1 * 

are  in  the  sequel. 

Whenever  a node  i discovers  a failure  of  its  link  to  the  preferred 
neighbour  p , it  sets  p = nil  and  8 W to  mean  that  its  estimated 

distance  to  the  SINK  has  become  infinite.  Then  node  i generates  a special 
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message  MSG(“)  which  propagates  backwards  through  the  tree  to  the  nodes 
that  lost  their  route,  causing  them  also  to  set  their  best  link  to  nil 
and  the  estimated  distance  to  infinite.  The  propagation  backwards  is  done 
as  follows.  Node  i sends  MSG(“)  to  all  its  neighbors  except  p^;  if 
a node  J receives  MSG(“)  from  a link  other  than  p » it  stores  it  but 

J 

no  ocher  action  is  taken;  if  a node  j receives  MSG(°»)  from  p j , then 

it  transmits  MSG(®)  to  all  its  neighbors  except  and  sets  p^  =>  nil* 

d = ®.  When  a node  establishes  p.  = nil,  d.  - it  is  said  to  enter 

J 11 

state  S3  (see  Table  3). 


The  second  part  of  the  recovery,  called  "reattachment ",  consists  of 
choosing  a new  be:  t link  by  those  nodes  i having  p^  = nil.  The  reattach- 
ment takes  place  if  one  of  the  following  two  situations  occurs.  One  possi- 
bility is  that  a node  with  p.^  = nil  receives  on  one  of  its  links,  2.  say, 
a message  MSG(d^°°)  and  the  node  is  assured  that  this  message  was  generated  by 
an  update  cycle  that  started  after  the  failure  that  caused  p^  - nil.  A 
second  possibility  is  that  at  the  time  pi  is  set  to  nil,  such  a message 
has  already  been  received  at  node  i.  The  reattachment  consists  of  setting 
» £,  going  to  state  S2  and  effecting  the  same  operations  as  in  T12. 

This,  together  with  other  mechanisms  to  be  described,  guarantees  that  if  a 
failure  (or  multiple  failures)  occurs,  and  if  indeed  a new  update  cycle  is 
started,  all  nodes  physically  connected  to  the  SINK  will  eventually  belong 
to  a non-disrupted  directed  tree  rooted  at  the  SINK. 


As  mentioned  above,  there  is  need  to  guarantee  that  reattachment 
is  performed  only  as  a result  of  receiving  a message  generated  by  an  up- 
date cycle  which  started  after  the  failure.  This  can  be  achieved  by  number- 
ing the  update  cycles  with  nondecreasing  numbers  as  described  below.  Each 
node  i will  have  a counter  number  which  denotes  the  cycle  number 

currently  handled  by  this  node,  and  all  messages  transmitted  by  i will 
carry  n in  addition  to  i.e.  they  will  be  MSG(n1>djL).  The  SINK 

may  Increase  its  n_— before  starting  a new  update  cycle,  as  explained  later. 

_ , nliSK  — — — t~— — > — ....  — - 


- 12  - 


A node  i receiving  MSG(m,d)  on  its  p^.  updates  its  Qj  to  equal  m. 

Now,  reattachment  is  done  by  a node  i with  pi  = nil  if  an  MSG(m,d)  with 
m > n^  is  received  (or  was  previously  received). 

When  an  MSG(m,d)  is  received  from  link  L by  node  i,  in  addition 

of  snoring  d in  D (i),  there  is  need  to  remember  also  the  value  of  m. 

/ 

This  can  be  saved  in  H (l)t  which  can  now  take  the  values  nil, 0, 1,2,3, ; 
instead  of  nil  and  RCVD  as  in  the  basic  algorithm. 


If  a failure  occurs  in  a link  not  belonging  to  the  directed  tree, 
no  route  is  disrupted.  However,  if  this  link  is  connected  to  a node  In 
state  S2,  it  is  convenient  to  prevent  T21  from  happening  at  this  node  for 
this  update  cycle.  This  will  avoid  nodes  to  update  routes  based  upon 
information  which  is  invalidated  by  the  failure  and,  more  important,  will 
preclude  proper  completion  from  happening.  Thus,  proper  completion  will 
indicate  to  the  SINK  that  the  update  cycle  was  completed  without  failures 
interfering  with  the  process.  Prevention  of  T21  is  accomplished  by  intro- 
ducing an  additional i state,  S2,  into  which  a node  enters  if  a nonpreferred 
link  fails  while  $he  node  is  in  S2.  A node  i will  leave  S2  whenever  new 
information  is  received  on  pi  (3ee  Table  3). 


The  described  protocol  allows  the  SINK  to  behave  as  follows.  If 


an  update  cycle  started  with  n, 


'SINK 


m completes  properly,  the  SINK  is 


allowed  to  start  the  next  update  cycle  with  the  same  n„_„,.  On  the  other 

SINK 

hand,  the  SINK  may  at  any  time  increase  n^,^  and  start  a new  update  cycle 
with  an  larger  than  those  used  before,  even  if  previous  cycles  have 

not  been  properly  completed.  (Notice  that  in  any  case  the  values  of  nOTri_ 
are  non-decreasing  with  time.)  As  proved  later,  if  a new  update  cycle  is 
started  while  increasing  will  eventually  "cover"  all  previous 

cycles.  Also,  if  failures  do  not  occur  for  a long  enough  time,  the  new  cycle 
will  be  properly  completed,  and  all  failures  will  be  recovered,  i.e.  for  all 


t 
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nodes  i physically  connected  to  the  SINK,  the  directed  graph  of  (ijpJ 
will  form  a tree  rooted  at  the  SINK. 

Table  2 summarizes  the  variables  used  by  the  algorithm  performed  by 

an  arbitrary  node  i as  its  part  of  the  protocol.  F^(t)  denotes  the  status 

of  link  % as  considered  by  node  i , i.e.  F^{£)  = UP  if  £•  is  considered 

operational  and  F^.(z)  = DOWN  if  l is  considered  unoperational . F_^(0  can 

take  also  the  value  "READY"  whose  use  will  be  described  when  dealing  with  the 

) 

problem  of  links  becoming  operational.  At  that  time,  the  role  of  z^(fc)  will 
also  become  clear.  The  variable  mx^  stores  the  value  of  the  largest  update 
cycle  number  m of  all  the  messages  MSG(m,d,i)  received  by  node  i.  The 
rest  of  the  variables  and  their  use  were  already  described.  The  local  link 
protocols  controlling  the  operations  of  the  links  connected  to  node  i may 
relay  to  the  algorithm  performed  by  node  i four  types  of  messages,  and  they 
are  als  summarized  in  Table  2.  MSG  denotes  an  updating  message,  FAIL(i) 
denotes  the  detection  of  the  failure  of  link  l , and  the  remaining  two  will 
be  described  later.  The  exact  properties  required  from  the  local  protocol 
to  insure  proper  operation  of  the  network  protocol  will  be  discussed  in 
Section  2.7» 


Table  3 describes  the  generalized  algorithm  of  node  i for  the  proto- 
col which  handles  topological  changes.  The  protocol  as  described  up  to  now 
is  implemented  by  the  algorithm  of  Table  3 if  ignoring  steps  1.1,  1.2.4,  1.3.1, 
1.4,  II. 1.5,  II. 2. 5 and  II. 7-7.  These  steps  relate  mainly  to  links  becoming 
operational  and  will  be  discussed  in  subsequent  sections.  Table  3 uses  a 
notation  similar  to  the  one  of  Table  1.  States  SI,  S2  and  transitions  T12  a 
and  T21  are  similar  to  those  described  in  Table  1 for  the  basic  algorithm. 

State  S3  denotes  the  situation  where  the  node  has  * nil,  which  results 
from  receiving  a FAIL  or  a MSG  with  d ■ « from  Pi*  State  S2  denotes  a 
state  similar  to  S2.  but  from  which  a transition  T21  is  precluded.  As 
previously  described,  the  algorithm  goes  to  such  a state  S2  if  while  at  S2 
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a failure  is  detected  from  a link  other  than  p^.  The  "Facts"  given  in  the 
algorithm  are  displayed  for  helping  in  its  understanding  and  are  proven  in 
Theorem  2 of  Section  3.  A Fact  holds  if  the  transition  under  which  it 
appears  is  performed. 

2.3  Starting  a Hew  Update  Cycle 

There  exist  several  procedures  for  starting  a new  update  cycle  and 

setting  the  corresponding  n_TWV  in  a way  which  satisfy  the  required  behaviour 

bXIvK. 

from  the  SINK  as  described  in  Section  2.2.  Two  of  these  procedures  are  des- 
cribed next. 

Version  1:  At  given  intervals  of  time,  or  as  a result  of  the  detection  of 

a change  in  the  traffic  pattern,  the  SINK  increments  n . and  starts  a 

bX-NJi 

new  update  cycle.  The  above  version  may  make  use  of  a time-out  to  trigger 
a new  update  cycle  if  the  previous  one  is  not  properly  completed  within 
certain  time.  If  a failure  occurs  after  proper  completion,  there  is  no 
direct  triggering  of  a new  update  cycle,  and  thus  recovery  can  be  achieved 
only  whenever  the  SINK  decides  to  starx  a new  update  cycle.  In  addition, 
this  version  unnecessarily  increments  ^or  every  update;  hence  an 

unnecessarily  large  number  of  bits  to  represent  n^^  is  required.  These 
two  disadvantages  are  overcome  by  the  next  version. 

Version  2:  In  order  to  cope  with  changes  in  traffic  patterns,  after  proper 
completion  of  the  previous  update  cycle,  the  SINK  may  start  a new  update 
cycle  with  the  same  BgjjjK*  In  atdLdLi -hion , whenever  a code  i detects  & 
failure  of  a link  attached  to  it,  the  node  generates  a special  message 
FEQCn^  which  is  forwarded  through  the  directed  path  of  preferred  links 
to  the  SINK.  If  such  a REQ(m)  arrives  at  a node  i having  ■ nil,  the 
REQ  is  discarded.  In  Section  3 it  is  shown  that  if  a REQ(ml)  is  generated 
nr.fl  forwarded  as  mentioned  above,  then  some  REQ  (m2) , m2  >_  ml  will  actually 
arrive  at  the  SINK,  within  finite  time.  Whenever  a REQ(m)  arrives  at  the  SINK, 
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I) 


ij 


and  if  m » nSINK*  ^en  n3lNX  *3  ^ncrGmen^e^  a new  uP<*ate  cycle  is 
started.  This  cycle  will  take  care  of  recovery  from  the  failure  that 
generated  the  REQ(m).  If  m < n^^  such  a cycle  was  already  started  and 
the  REQ(m)  can  simply  be  ignored.  (Notice  that  m cannot  be  larger  than 
nSujK.)  This  version  guarantees  that  if  an  update  cycle  with  a m 

is  started,  the  cycle  will  be  properly  completed  in  finite  time  or  else,  a 
failure  has  occurred  and  a REQ(m)  will  arrive  at  the  SINK.  (This  is  proved— “ 
in  Section  3.)  Thus,  there  is  no  need  for  a time-out  to  make  sure  that  the 
SINK  will  not  wait  indefinitely  for  the  proper  completion  of  an  update 
cycle.  The  additions  to  the  algorithm  for  implementing  this  version  are 
given  in  1.1  and  1.2.4  of  Table  3.  In  the  rest  of  the  paper,  ve  assume 
that  this  version  is  implemented,  although  most  of  the  results  are  also 
applicable  to  Version  1, 


2 . 4 Handling  Links  Becoming  Qperat ional 

If  link  (i,4)  is  down,  i.e.  F^fc)  = F^(i)  «=  DOWN,  and  it  becomes 

operational,  nodes  i and  4 should  coordinate  the  operations  necessary 

to  bring  the  link  up.  Otherwise,  a deadlock  could  occur,  for  instance,  if 

1 sets  F (4)  - UP  while  at  32  and  4 sets  F (i)  “ UP  after  performing 
jl  ® 

T21  of  the  same  update  cycle.  In  this  case,  i will  not  perform  T21  until 
receiving  a message  from  4,  and  such  a message  will  not  be  sent  because 
4 already  completed  this  update  cycle,  i.e.  deadlock. 


The  coordination  is  achieved  by  having  both  nodes  bring  the  link  up 
Just  before  starting  to  perform  their  part  of  the  same  new  cycle.  This 
is  dona  in  two  steps.  First,  i and  4 compare  and  via  the  local 

protocol  and  decide  to  bring  up  the  link  when  starting  to  process  the  first 
cycle  with  number  strictly  higher  than  max(n^,n^).  This  fact  is  remembered 
at.  the  nodes  by  Betting  s (t)  and  z£(i)  to  maxCn^n^),  as  well  as 
Fi(4)  and  F£(i)  to  "READY”.  In  addition,  N^l)  and  N£(i)  are  iet  to 
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nil  and  REQ(z^(£ ))  is  generated  by  nodes  i and  £ and  forwarded  to  the 
SINK  in  the  same  way  as  described  in  Section  2.3  (Version  2)  for  failures. 
This  will  guarantee  that  an  update  cycle  with  larger  than  z^(£) 

(and  z„(i))  will  he  started.  This  first  step  of  the  coordination  at  node 
i is  done  by  message  WAKE(fc)  given  by  the  local  protocol  to  the  algorithm. 
The  actions  performed  hy  the  algorithm  when  receiving  such  a message  are 
described  in  1,4  of  Table  3.  The  synchronization  assumes  that  the  execution 
of  WAKE (A ) and  WAKE(i)  are  simultaneously  started  at  nodes  i and  £ 
respectively,  in  order  to  guarantee  that  z^(£)  ® z^(i).  However,  it  may 
happen  that  a failure  occurs  again  in  the  link  and  one  of  the  nodes  succeeds 
to  complete  the  synchronization  while  the  other  node  does  not.  The  protocol 
allows  for  such  a situation  and  only  requires  that  the  link  protocol  ends 
the  synchronization  (successfully  or  unsuccessfully)  within  finite  time.  If 
the  synchronization  is  unsuccessful,  no  action  is  taken  by  the  node,  and  the 
link  will  remain  DOWN  from  this  node’s  point  of  view.  Section  2.7  gives  a 
more  formal  and  complete  list  of  the  requirements  that  the  link  protocol 
should  satisfy. 

The  second  step  of  bringing  the  link  (i,£)  up  is  done  by  node  i 
(i.e.  Fi(£)  is  set  from  READY  to  UP)  when  node  i receives  MSG  from  link 

£ or  when  the  node  counter  timber  becomes  larger  than  z^(£).  This 

is  represented  respectively  by  1.3.1  and  11.1.5*  II. 2. 5*  11.7*7*  of  Table  3. 

2 . 5 The _ Algorithm  for  the  SINK 

The  algorithm  for  the  SINK,  assuming  that  Version  2 of  Section  2.3 
is  chosen,  appears  in  Table  4.  Most  of  the  algorithm  was  already  informally 
discussed  in  previous  sections.  The  main  difference  between  the  algorithm 
for  the  SINK  and  that  for  an  arbitrary  node  1 is  that  the  first  does  not 
need  to  keep  tho  following  variables: 
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- pi  (which  is  not  defined  for  the  SINK) 

- d^  (which  is  always  0 for  the  SINK) 

- ( JL ) (which  is  only  needed  to  update  and  p^) 

- mo^  ^nsiNK  a^ways  Ingest  update  number) 

- z.(S.)  (during  WAKE  synchronization,  £, ) is  always  set  to 

1 SINK 

nSINK  = maX^nSINK,nJL^ 

In  addition,  the  algorithm  may  receive  a "START"  message  from  the  "outside 
world"  which  will  cause  it  to  start  a new  cycle,  provided  that  the  last  one 
was  properly  completed.  WAKE  and  REQ  call  also  for  the  execution  of  the 
Finite-State-Machine,  and  as  a result  WAKE  as  well  as  REC}(m  = nPT„„)  will 
cause  an  increment  of  and  a new  update  cycle  will  be  started.  States 

SI  and  S2  are  similar  to  the  corresponding  states  of  the  algorithm  for  an 
arbitrary  node  i . However,  Si  means  for  the  SINK  that  the  last  update 
cycle  was  properly  completed,  and  S2  means  that  the  current  update  cycle  is 
not  yet  completed.  T12  and  T22  represent  the  starting  of  a new  update  cycle 
and  T21  the  proper  completion.  For  the  SINK  there  is  no  need  for  states 
equivalent  to  S3  and  S2. 


2.6  Initialization  of  the  Protocol 


Any  arbitrary  node  i comes  into  operation  in  state  S3*  with  node 
counter  number  = 0,  preferred  neighbor  pi  * nil,  and  F^k)  » DOWN 
for  all  k.  The  value  of  the  remaining  variables  is  immaterial.  From  this 
initial  condition,  the  local  protocol  may  try  to  wake  the  links  and  it 
proceeds  operating  as  defined  by  the  algorithm  (Table  3) . The  SINK  comes 
into  operation  in  state  SI,  with  ngjjjK  * ® a*1*!  *j(k)  * DOWN  for  all  k, 

and  proceeds  in  the  same  way  but  according  to  the  algorithm  of  Table’  4, 


_ w 
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2.7  Properties  Required  from  the  Local  Protocol 

On  each  link  of  the  network  there  i3  a local  protocol  that  is  in 
charge  of  exchanging  messages  between  neighbors.  Our  main  algorithm  assumes 
that  the  following  properties  hold  for  the  local  protocol: 

2.7.1  All  links  are  bidirectional,  (duplex). 

I 

2.7- 2  d > 0 for  all  links  (i,&)  at  all  times. 

2.7*3  If  a message  is  sent  by  node  i to  a neighbor  then  in  finite 

time,  either  the  message  will  be  received  correctly  at  9.  or 
F^(*.)  35  P^(i)  = DOWN.  Observe  that  this  assumption  does  not  preclude 
transmission  errors  that  are  recovered  by  the  local  protocol  (e.g. 

"resend  and  acknowledgement" ) . 

2.7.4  Failure  of  a node  is  considered  as  failure  of  all  links  connected 
to  it. 

2.7- 5  A node  i comes  up  in  state  S3,  with  n^  =0,  p^  = nil,  and 

F±(i)  = DOWN  for  all  links  (i,Z). 

2.7.6  The  processor  at  node  i receives  messages  from  link  (i,t)  on  a 
first-in-first -served  (FIFO)  basis. 

2.7.7  A link  (i ,*)  is  said  to  have  become  operational  as  soon  as  the 
local  protocol  discovers  that  the  link  can  be  used.  Links  (1„A) 
and  ( A , i ) become  operational  at  the  same  time  and  subject  to  the 
following  restrictions , a WAKE  "message"  is  delivered  in  this  case 
to  each  of  the  processors  i and  l. 

WAKE(t)  can  be  received  at  node  i only  if 

(a)  node  A receives  WAKE(i)  at  the  same  (virtual)  time; 

(b)  there  are  no  other  outstanding  messages  on  link  (±,t)  and  on  (A'ilTf 

(c)  Fi(A)  = F£(i)  = DOWN. 
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2.7.8 


2.7-9 


2.7.10 


If  F^(t)  = DOWN,  the  only  message  that  the  processor  at  i can 
receive  from  l is  WAKE(£). 

(a)  If  F i(£)  ft  DOWN  and  F£(i)  ft  DOWN  and  F^Jl)  goes  to  DOWN, 
then  F£(i)  goes  to  DOWN  in  finite  time. 

(b)  If  F^(Jt)  = F£(i)  = DOWN  and  F.(A)  goes  to  READY,  then  in 
finite  time,  either  F£(i)  goes  to  READY  or  FA&)  **  F£(i)  * DOWN. 

When  two  nodes  i and  l receive  WAKE  as  described  in  2.7.7»  a 
"synchronization"  between  I and  i is  attempted.  At  either  end  the 
synchronization  may  or  may  not  be  successful  (the  latter  because  of 
a new  failure).  If  it  is  successful,  the  node  proceeds  as  in  Step  1.4 
of  Table  3.  If  not,  then  no  action  is  taken. 


. PROPERTIES  AND  VALIDATION  OF  THE  ALGORITHM 

Some  of  the  properties  of  the  algorithm  have  already  heen  indicated 

in  previous  sections.  Here  we  state  them  explicitly  along  with  some  of  the 

others.  We  start  with  properties  that  hold  throughout  the  operation  o *.  the 

network,  some  of  them  referring  to  the  entire  network  at  a given  Instant  of 

time  and  some  to  a given  node  or  link  as  time  progresses.  Then  recovery  of 

the  network  after  topological  changes  is  proved  through  a series  of  theorems, 

and  finally  we  state  and  prove  the  fact  that  the  algorithm  achieve-  shortest 

weighted  routes.  We  may  point  out,  that  the  most  important  features  of  the 

* 

algorithm  are  given  in  Theorems  1,  k,  5 and  6. 

Before  stating  the  main  properties  of  the  algorithm,  we  need  several 
definitions  and  notations : 

SI,  S2,  S2,  S3  » states  of  the  Finite-State  Machine. 

PC(m)  =»  time  of  proper  completion  with  cycle  counter  number  m. 

Sl[n]  = 3tate  SI  with  node  counter  number  n^  =*  n,  and  similarly  for 
S2[n],  S3[n],  S2[n] .. 

Whenever  we  want  to  refer  to  a quantity  at  a given  time  t we  add  the  time  in 
in  parentheses  (e.g.  p^(t)  means  preferred  neighbor  of  node  i at 

time  t,  N^iHt)  means  variable  N^(t)  at  time  t,  etc.) 

a^(t)  «*  state  and  possibly  node  counter  number  of  node  i at  time  t. 

Therefore  we  sometimes  write  s^(t)  * S3  for  instance,  and  sometimes 
si(t)  » S3(n) . 

We  use  a compact  notation  to  describe  changes  accompanying  a transi- 
tion, as  follows: 

Txy[t,i,MSG(ml,dl,£l),SEND(m2,d2,£2),(nl,n2),(dl,d2),(pl,p2),(mxl,mx2) J (la) 

will  mean  that  transition  from  state  Sx  to  state  Sy  takes  place  at  time 
t at  node  i caused  by  receiving  MSG(ml ,dl)  from  neighbor  £1;  in  this 
transition  i sends  MSG(m2,d2)  to  A2,  changes  its  node  counter  number 
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from  nl  to  n2,  its  estimated  distance  to  destination  d^  from  dl 
to  d2,  its  preferred  neighbor  from  pi  to  p2  and  the  largest  up- 
date counter  number  received  up  to  nov  from  mxl  to  mx2a  Similarly, 

Txy[t,i,FAIL(A),SEtro(o2,d2,£2),(nl,n2)(dl,d2),(pl,p2),(mxl9mx2)]  (lh) 

denotes  the  same  transition  as  above,  except  that  it  is  caused  by  receiv- 
ing FAIL(A)  from  neighbor  4 . For  simplicity,  we  delete  all  arguments  that 
are  of  no  interest  in  a given  description,  and  if  for  example  nl  is 
arbitrary  we  write  ($,n2)  instead  of  (nl,n2).  Similarly,  if  one  of  the 
states  is  arbitrary,  <f>  will  replace  this  state.  In  particular  observe 
that 

T<j>2[t,SINK,($,n2)]  (2) 

means  that  an  updating  cycle  with  number  n2  is  started  at  time  t and 

T2l[t,SINK,(n2,n2)]  (3) 

means  that  proper  completion  of  the  cycle  occurs  at  time  t.  If  Txy[t], 
then  we  use  the  notations: 

t-  = time  Just  before  the  transition, 
t+  =»  time  Just  after  the  transition. 

We  also  use 

[t,i,MSG(m,d,4)]  (h) 

to  denote  the  fact  that  a message  MSG(ra,d)  is  received  at  time  t at  i 
from  4,  whether  or  not  the  receipt  of  the  message  causes  a transition. 

Finally,  at  a given  instant  t,  we  define  the  Routing  Graph  RG(t) 
as  the  directed  graph  whose  nodes  are  the  network  nodes  end  whose  arcs  are 
given  by  the  preferred  neighbors  p, , namely  there  is  an  arc  from  node  i 
to  node  l if  and  only  if  p (t)  » l.  For  example,  the  routing  graph  of 
the  network  in  Fig.  la  is  given  in  Fig.  lb.  In  order  to  describe  properties 
of  the  RG(t),  ve  also  define  an  order  for  the  states  by  S3  > S2  ■ 32  > SI. 
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Also,  if  Sx  and  Sy  are  states,  then  the  notation  Sx  >_  Sy  means  Sx  > Sy 
or  Sx  =*  Sy.  For  conceptual  purposes,  ve  regard  all  the  actions  associated 
with  a transition  of  the  Finite-State  Machine  to  take  place  at  the  time  of 
the  transition. 

Theorem  1 

At  any  instant  of  time,  RG(t)  consists  of  a 3et  of  disjoint  trees 
wi  i the  following  ordering  properties: 


i) 

the 

roots 

of  the  trees  are  the  SINK  and  all  nodes  in  S3; 

ii) 

if 

Pt(t) 

= Z , then  n^  ( t ) >_  n^  ( t ) ; 

iii) 

if 

P1(t) 

a l and  n(t)  « n (t),  then  s (t)  > s.(t); 

Xr  1 *r  1 

iv) 

if 

Pi(t) 

= #-  and  n (t)  = n, (t)  and  s (t)  = s.(t)  = SI,  then 

& X A>  X 

dz(t)  < d (t). 

The  proof  of  Theorem  1 is  given  in  Appendix  A.  According  to  it,  the 
RG  consists  at  any  time  of  a set  of  disjoint  trees,  i.e.  it  contains  no  loops. 
Observe  that  a tree  consisting  of  a single  isolated  node  is  possible.  The 
algorithm  maintains  a certain  ordering  in  a tree,  namely  that  concatenation 
of  is  nondecreasing  when  moving  from  the  leaves  to  the  root  of  a 

tree  and  in  addition,  for  ryodes  in  SI  and  with  the  3ane  node  counter  number, 
the  estimated  distances  d^  to  the  SINK  ore  strictly  decreasing- 

In  addition  to  properties  of  the  entire  network  at  each  instant  of 
time,  we  can  look  at  local  properties  as  time  progresses.  Some  of  the  most 
important  are  given  in  the  following  theorem  whose  proof  appears  in  Appendix  A 
(see  c)  and  d)  in  Theorem  A.l). 
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Theorem  2 

i)  For  a given  node  i,  the  node  counter  number  n^  is  nondecrea3ing 
and  the  messages  MSG(m.d)  received  from  a given  neighbor  have  non- 
decreasing numbers  m. 

ii)  Between  two  successive  proper  completions  PC(m)  and  PC(m),  for  each 
given  m with  m <_  m <_  m,  each  node  sends  to  each  of  its  neighbors 
at  most  one  message  MSG(m,d)  with  d < ». 

iii)  Between  two  successive  proper  completions  PC(o)  and  PC(m),  for  each 
given  in  with  m <_  m <_  m,  a node  enters  each  of  the  sets  of  states 
iSl[m]},  {S2[m],  S2[m]},  at  most  once . 

iv)  All  "Facts"  in  the  formal  description  of  the  algorithm  in  Section  II 
are  correct. 

* 

A third  theorem  describes  the  situation  in  the  network  at  the  time 
proper  completion  occurs: 

Theorem  3 

At  PC(ra),  the  following  hold  for  each  node  i: 

i)  If  nj^  ■ m,  then  s1  * SI  or  s ■ S3. 

ii)  If  a message  MSG(m,d)  with  d j4  <*  is  on  its  way  to  i,  then 
si  a S3  and  ^ * m. 

iii>  If  eithei  ^ and  * SI)  or  n^  ^ m,  then  for  all  k such 

that  F^k)  - UP,  it  cannot  happen  that  (N^(k)  » a,  D^(k)  < ••}. 

A combined  proof  is  necessary  to  show  that  the  properties  appearing 
in  Theorems  1,  2,  3 hold.  The  proof  uses  a two-level  induction,  first  assum- 
ing properties  at  PC  to  hold,  then  shoving  that  the  other  properties  hold 
between  this  and  the  next  PC  and  finally  proving  that  the  necessary  proper- 
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ties  hold  at  the  next  PC.  The  second  induction  level  proves  the  properties 
between  successive  proper  completions  by  assuming  that  the  property  holds 
until  just  before  the  current  time  t and  then  showing  that  any  possible 
change  at  time  t preserves  the  property,  The  entire  rigorous  procedure 
appears  in  Appendix  A. 

In  order  to  introduce  properties  of  the  algorithm  regarding  normal 
activity  and  recovery  of  the  network,  we  need  several  definitions. 

Definition 

We  say  that  a link  (i,i)  is  potentially  working  if  + DOWN 

and  F^(i)  # DOW!?,  and  a link  (i,i)  is  working  if  F,(i)  53  F^(i)  =*  UP* 

Two  nodes  in  the  network  are  said  to  be  potentially  connected  at  time  t 

if  there  is  a sequence  of  links  that  are  potentially-  working  at  time  t 
connecting  the  two  nodes.  A set  of  nodes  Is  said  to  be  strongly  connected 

to  the  SINK  if  all  nodes  in  the  set  are  potentially  connected  to  the  SINK 

and  for  all  lirjks  (i,A)  connecting  those  nodes,  we  have  either 

FjU)  « F (i)  » UP  or  F±(£.)  » F (i)  - DOWN. 

Definition 

Consider  a given  time  t » and  let  ml  be  the  highest  counter 
number  of  cycles  started  before  t . We  say  that  a pertinent  topological 
change  happens  at  time  t if  the  algorithm  at  a node  i with  n^(t~)  » ml 
receives  at  time  t a message  WAKE(i)  resulting  in  successful  WAKE  syn- 
chronization or  a message  FAIL(i).  Observe  from  steps  1.2  end  1.4  of 
Table  3 that  REQ{ml)  is  generated  and  sent  if  and  only  if  a pertinent 
topological  change  happens  at  a node  i with  p^  j*  nil.  Also  note,  that  a 
pertinent  topological  change  happens  if  and  only  if  node  I has  a link 

9 

(i,k)  such  that  at  time  t,  F^(k)  charges  from  DOWN  to  READY  or  from 
either  UP  or  READY  tp  DOWN  (see  Fig.  2). 
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Theorem  4 (Normal  activity) 

Let 

L(t)  = (nodes  potentially  connected  to  SINK  at  time  t)  , 

H(t)  * (nodes  strongly  connected  to  SINK  at  time  t)  . 

Suppose 

T$2[tl,SINK,(ml,ml)  ] (5) 

namely  a cycle  is  started  at  tl  with  a number  that  was  previously  used. 
Suppose  also  that  no  pertinent  topological  changes  have  happened  while 
nSINK  = 211  fee^ore  no  sucil  changes  happen  for  long  enough  time 

afcer  tl.  Then  there  exist  tO,  t2,  t3  with  tO  < tl  < t2  < t3  < ® such 
that  a),  b),  c),  d)  hold: 

a)  T2l[tO,SINK,  (’ml,ml)  ] ; (6) 

b)  Tt  in  the  interval  [t0,t3],  we  have  E(t)  = L(t)  = L(tO); 

c)  for  all  i e L(tO),  we  have 

T$2[t2i,i,(ml>ml)]  (7) 

tor  some  time  t2^  in  the  interval  [tl,t2j; 

d)  i)  T2l[t3,SINK,(ml,ml)h  (8) 

ii)  RG(t3)  for  all  nodes  in  L(tO)  is  a single  tree  rooted  at  SINK. 

In  word^ * Theorem  4 says  that  under  the  given  conditions,  if  a new 
cycle  starts  with  a number  that  was  previously  used,  then  Proper  Completion 
with  the  same  number  has  previously  occurred  and  the  new  cycle  will  be 
properly  completed  in  finite  time  while  connecting  all  nodes  of  interest  (i.e. 
in  L(tO))  to  the  SINK,  both  strongly  and  routingwise.  The  proof  of  Theo- 
rem 4 is  given  in  Appendix  B. 

T.ie  recovery  properties  of  the  algorithm  are  described  in  Proposi- 
tions 1,  2 and  in  Theorem  5.  The  proofs  of  the  propositions  appear  in 
Appendix  B. 


- 26  - 


Proposition  1 

Let  L(t),  H(t)  be  as  in  Theorem  U.  Suppose 

T<j>2[tl,SINK,(ml,m2)  ] ; m2  > ml  , (9) 

namely  a cycle  starts  at  time  tl  with  p,  number  that  va3  not  previously 
used.  Suppose  also  that  no  pertinent  topological  changes  happen  for  a 
long  enough  period  after  tl.  Then 

a)  there  exists  a time  t2,  with  tl  <_ t2  < «,  such  that 

i)  for  all  i e L(t2) 

T*2[t2  ,i,(<J>,m2)3  (10) 

happen  at  some  time  t2^  with  tl  <_  t2^  t2. 

ii)  H(t2)  * L(t2)  . 

b)  There  exists  a time  t3  < “ such  that 

i)  T2l[t3,SINK,(m2,m2)]  ; (11) 

ii)  Yfe  in  the  interval  [t2,t3l,  we  have;  K(t)  » L(t)  » H(t2); 

iii)  RG(t3)  for  all  nodes  in  L(t3)  is  a single  tree  rooted  at 
SINK. 

Part  a)  of  Proposition  1 says  that  under  the  stated  conditions,  all  nodes 
in  L(t)  will  eventually  enter  state  S2[m2].  Part  b)  says  that  the  cycle 
will  be  properly  completed  and  all  nodes  potentially  connected  to  the  SINK  . 
at  time  PC  (m2)  will  actually  be  strongly  connected  to  the  SINK  and  will  also 
have  a routing  path  to  the  SINK. 

Finally,  we  observe  that  reattachment  of  a node  loosing  its  path  to 
the  SINK  or  bringing  a link  up  requires  a cycle  with  a counter  masher  higher 
than  the  one  the  node  currently  has . Proposition  2 ensures  that  such  a cycle 
has  been  or  will  be  started  in  finite  time  by  the  pINK. 


I 
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Proposition  2 

Suppose  a node  i receives  FAIL(i)  while  n.  = ml  or  a successful 

i 

WAKE(£.)  synchronization  occurs  at  node  i while  z.(i)  = ml . Then  the 

l 

SINK  has  received  before  t a message  REQ(ml)  or  will  receive  such  a message 
In  finite  time  after  t. 

Propositions  1 and  2 are  combined  in:  I 

I 

Theorem  5 (Recovery  theorem) 

Let  L(t),  H(t)  be  as  in  Theorem  4.  Suppose  there  is  a time  tl 
after  which  no  pertinent  topological  changes  happen  in  the  network  for  long 
encugh  time.  Then  there  exists  a time  t3  with  tl  <_  t3  < °“  such  that 
all  nodes  in  L(t3)  are  strongly  connected  to  the  SINK  and  are  on  a single 
tree  rooted  at  SINK. 

Proof 

Let  to  <_  tl  he  the  time  of  the  last  pertinent  topological  change 
before  tl.  Let  i he  the  node  detecting  it  and  let  m = n^(tO-).  Then 
Proposition  2 assures  that  a message  REQ(m)  arrives  at  some  finite  time  at 
SINK.  Let  t.2  < «•  he  the  time  the  first  REQ(m)  message  arrives  at  SINK. 

Condition  12  or  22  in  Table  4 dictates  that  SINK  will  start  at  time  t2  a 
new  cycle,  with  number  ml  =»  m + 1.  Since  by  the  definition  of  pertinent 
change,  m ia  the  largest  number  at  time  tO,  we  have  that  tO  < t2.  By 
assumption,  no  pertinent  topological  changes  happen  after  time  tO  for 
« a long  enough  period,  so  that  no  such  changes  happen  after  time  t2.  Con- 

sequently Proposition  1 holds  after  this  time  and  the  assertions  of  the 
Theorem  follows. 

Theorem  6 (Shortest  paths) 

With  the  notations  of  Theorem  5,  suppose  the  conditions  of  Theorem  5 
hold  and  in  addition,  suppose  that  the  weights  of  the  \Finks  are  time  in- 

variant for  a long  enough  period  after  tl.  Then,  after  completion  of  a 
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finite  number  of  cycles  after  t3,  the  routing  graph  RG  will  provide  phe 
shortest  route  in  terms  of  the  weights  from  each  node  in  L(t3)  to 

the  SINK.  Let  SR  be  the  graph  providing  the  shortest  routes  in  terms  of 
d . Then  the  necessary  number  of  cycles  Is  bounded  from  above  by  the 
largest  distance  from  SINK  in  terms  of  number  of  hops  on  SR. 

Proof 

Observe  from  steps  II. 1.3  and  II. 3. 7 in  Table  3,  that  during  the 
first  cycle  after  t3  all  nodes  closest  to  SINK  on  SR  will  have  pA  » SINK 
and  will  never  change  p^  afterwards. 

Next,  consider  any  connected  subgraph  A of  SR  that  Includes  the 
SINK.  Suppose  that  at  the  time  of  a cycle  completion  SR  and  RG  coincide 
for  nodes  in  A.  Then  these  nodes  will  never  change  their  preferred  neighbors 
afterwards.  Also  during  the  next  cycle  at  least  the  nodes  neighboring  A 
on  SR  will  change  their  p^  such  that  RG  and  SR  will  coincide  for  them  too, 
and  this  proves  the  assertion. 


- 29  - 

IV.  DISCUS3I0K  AND  CONCLUSIONS 

t 

The  paper  presents  an  algorithm  for  constructing  and  maintaining 
loop- free  routing  tables  in  a data-netvork , when  arbitrary  failures  and 
additions  happen  in  the  network.  Clearly,  the  properties  that  are  rigorously 
proved  in  Section  3 and  the  Appendices  hold  also  for  several  other  \ rsions 
of  the  algorithm,  some  of  them  simpler  and  some  of  them  more  involved  than  I 

the  present  one.  We  have  decided  on  the  present  form  of  the  algorithm  as  a 
compromise  between  simplicity  and  still  keeping  some  properties  that  are 
intuitively  appealing.  For  example,  one  possibility  is  to  increase  the  up- 
date cycle  number  every  time  a new  cycle  is  started.  This  will  not  simplify, 
the  algorithm,  but  will  greatly  simplify  the  proofs.  On  the  other  hand,  it 

will  require  many  more  hits  for  the  update  cycle  and  node  numbers  m and 

• 

than  the  algorithm  given  in  the  paper.  Another  version  of  the  algorithm 
previously  considered  by  us  was  to  require  that  every  time  a node 

receives  a number  higher  than  from  some  neighbor,  it  will  ’’forget"  all 

its  previous  information  and  will  "reattach"  to  that  node  immediately,  by  a 
similar  operation  to  transition  T32.  This  change  in  the  algorithm  would 
considerably  simplify  both  the  algorithm  and  the  proofs,  but  every  topologi- 
cal change  will  affect  the  entire  network,  since  after  any  topological  change, 
all  nodes  will  act  as  if  they  had  no  previous  information.  On  the  other 
hand,  the  version  given  in  the  paper  "localizes”  failures  In  the  sense  that 
only  those  nodes  whose  best  path  to  SINK  was  destroyed  will  have  to  forget 
all  their  previous  information.  This  is  performed  in  the  algorithm  by  re- 
quiring that  nodes  not  in  S3  will  wait  for  a signal  from  the  preferred  neigh- 
bor p^  before  they  proceed,  even  if  they  receive  a number  higher  than  n^ 
from  other  neighbors.  The  signal  may  be  either  “»  in  which  case  the  node 
enters  S3  (and  eventually  reattaches)  or  less  than  •*,  in  which  case  the 
node  proceeds  as  usual. . 


- 30  - 


A final  remark  regarding  the  amount  of  control  information  required 
by  the  protocol.  Observe  that  since  for  each  update  and  for  each  destina- 
tion each  node  sends  over  each  link  the  distance  d.  and  the  node  counter 

1 

number  n , the  amount  of  information  sent  over  each  link  is  of  the  same 
' order  of  magnitude  as  the  ARPA  routing  protocol  [7]-  The  difference  ia 

that  the  latter  allows  information  for  all  destinations  to  he  sent  in  one 

i 

message,  whereas  our  protocol  requires  in  principle  separate  messages  for 
different  destinations  (although  sometimes  several  messages  may  be  packed 
together).  If  the  overhead  for  control  messages  is  not  too  large  however, 
the  extra  load  will  not  be  significant. 
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Appendix  A 

We  organize  the  proofs  as  follows:  We  start  with  the  statements  of 
a few  properties  that  follow  immediately  from  the  formal  description  of  the 
algorithm  in  Table  3.  Lemmas  A.l  - A. 4 and  Theorem  A.l  contain  the  proofs 
of  Theorems  1,  2 and  3,  together  with  some  other  properties  needed  in  the 
proofs  themselves.  Theorem  4 and  Propositions  1 and  2 will  be  proved  in 
Appendix  B . I 

Properties  of  the  Algorithm 

R1  Any  change  in  n^,  s^,  pi$  or  sending  any  message  (m,d)  can  happen 
only  while  i performs  a transition. 

R2  Txy[t,i,SEND(m,d),(<f>,n2)»U,d2),U,mx2)]  implies  d = d2. 

If  d t «,  then 

i)  Txy  * T12  or  T21  or  T22  or  T32  or  t22 

ii)  n2  = mx2  = m 

If  d = °°,  then 

iii)  Txy  » T13  or  T23  or  T23 

iv)  n2  ■ m . 

R3  T32[t,i,(nl,n2)]  ->  n2  > nl 

r4  si(t)  » S3  <^>  p^(t)  * nil  <mm>  d^(t)  * * 

R5  Txy[t,i,(pl,p2)],  pi  t nil,  p2  pi  Txy  * T13  or  T21  or  T23  or  T23. 

R6  mx . ( t ) is  nondecreasing  in  time  for  any  i. 

R7  In  the  Finite-State-Machine,  no  two  conditions  can  hold  at  the  same  time. 

This  implies  that  the  order  of  checking  the  conditions  of  the  transitions 
is  irrelevant. 

R8  For  all  t and  all  nodes  i in  the  network,  ngujj(^)  ~ n^(t)  and 
nSIBK(t) 
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R9  The  Finite-State  Machine  han  two  types  of  transitions.  The  first  type 
is  effected  directly  by  the  incoming  message,  while  the  & .tcond  type  is 
caused  by  the  situation  in  the  memory  of  the  node.  Transitions  T23 
and  T21  are  of  the  second  type,  all  others  are  of  the  first  type.  Each 
message  can  trigger  only  one  transition  of  the  first  type,  and  this 
transition  comes  always  before  transitions  of  the  second  type.  This 
is  controlled  by  the  variable  CT  in  Table  3. 

RIO  The  possible  changes  of  F ^(i)  are  given  in  Fig.  2.  The  types  of 

messages  causing  them  are  also  shown.  A pertinent  topological  change 
happens  iff  F (£)  DOWN  or  F (£)  changes  from  DOWN  to  READY  at  a 

1 X 

node  i with  n^t-)  - ml,  where  ml  is  the  highest  counter  number  of 
cycles  started  before  t. 

The  following  lemma  says  that  the  node  number  can  be  changed 

only  when  receiving  a message  from  the  preferred  neighbor  and  then,  the 
new  number  is  exactly  the  cycle  number  m received  in  that  message.  It  also 
gives  conditions  for  leaving  state  S3. 


T*y[t,  i,  MSG(m,d,£) , (nl,n2),  (pi ,$)]  (A>1) 

Txy[t,i,  FAIL(i)  , (nl,n2) , (pl,<j>)  ] 

then 

a)  pi  j*  nil,  n2  nl  — > l * pi  and  n2  **  m ; 

b)  pi  * nil  — > n2>nl,  and  also  3^  •*t • Fj^kKt-)  “ UP,  Hj.(k)(t-)  * n2. 


Lemma  A.l 


or 
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Proof 

a)  From  the  algorithm  we  see  that  T21,  T22,  T22  do  not  apply  here  since 
they  imply  n.2  ■ nl.  Also  T32  does  not  apply,  since  then  pi  ■ nil. 

If  T13,  T23  or  T23  is  caused  by  FAIL(£)  then  n2  - nl,  so  this 
case  does  not  apply  either.  In  all  other  cases,  n2  ■ m and  pi  " £ 
(see  II. 1.4,  II. 2.1,  II. 2. 4 in  Table  3). 

b)  pi  = nil  implies  Txy  * T32  and  the  assertion  follows  from  steps 
II. 7.1  and  II .7. 5 in  Table  3. 

The  next  lemma  proves  statement  i)  of  Theorem  2 aid  shows  the  role  of 
the  node  counter  number  n^.  Here  we  see  for  the  first  time  that  several 
properties  have  to  be  proved  in  a common  induction. 

Lemma  A. 2 

a)  [i,tlfMSG(ml,dl,£)],  [i,t2,MSG(m2,d2, £)  J , t2  > tl  m2  >_  ml. 

b)  T<f>4>  [t,  i,  (nl,n2)  ] "»  n2  ^ nl  . 

c)  Let  M^t.p^t)}  denote  the  counter  number  m of  the  last  message 
MSG(m,d)  received  at  i before  or  at  time  t from  the  preferred 
neighbor  p^t).  Then 

r»±(t)  < Mi(t,pi(t))  (A. 2) 

Proof 

The  proof  proceeds  by  induction.  We  assume  that  a),  b),  c)  hold  up 
to,  but  not  including,  time  t for  all  nodes  in  the  network.  We  then  prove 
below  that  any  possible  event  at  time  t preserves  the  properties.  This, 
combined  with  the  fact  that  a) , b) , e)  bold  trivially  at  the  time  any 
node  comes  up  for  the  first  time  ^ completes  the  proof. 


4 
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a)  Suppose  t = t2.  Then  by  FIFO  and  property  R2,  with 

t3  < t4  < t such  that  ^(£3)  “ml  and  n^(tU)  = m2.  By  induction 
hypothesis  on  b),  was  nondecreasing  up  to  (but  not  including) 

time  t,  so  ml  <_  m2 . 

b)  Observe  first  from  steps  II. 2. 4 and  II. 5.1  in  Table  3, 

T<j><H  t , i , FAIL  (£),<  nT,  n2)  ] 

Implies  n2  * nl,  so  that  the  statement  is  true  in  this  case.  We 
therefore  have  to  check  only  the  case  when  the  transition  ia  caused 
by  MSG.  Suppose 

T$<fr[t,  1,  MSG(m,d,£) , (nl,n2) , (pl,p2) ] (A. 3) 

happens.  If  n2  = nl,  q.e.d.  If  n2  j*  nl,  then  Lemma.  A.l  implies 
that  either  pi  * nil  or  (pi  = £ , n2  =*  m) . If  pi  • nil,  q.e.d. 
from  Lemma  A.l.  If  (pi  * £,  n2  * m), 

then 

nl  <.  Mi(t-,pl)  - <.  M± ( t , £ ) » m * n2  (A. 4) 

where  the  inequalities  follow  respectively  from  induction  hypothesis 
on  c)  and  from  applying  a)  at  time  t. 


c)  We  have  to  show  that  if 

[i, t,  MSG(m,d,£),(nl,n2),(pl,p2)j 

then 

i)  £ * pi  **p2  implies  n2  <_  m,  and 
ii)  p2  j*  pi,  p2  # nil  implies  n2  <.  Mi(t+,p2) 


(A. 5) 


To  do  this  we  check  i possible  transitions  and  also  the  case  when  the  racaivad 
message  causes  no  transition.  T13,  T23  and  T23  do  not  apply  here  bacauaa 
then  pi  i nil,  p2  - nil.  If  T22  or  no  transition,  than  p2  - pi  and 
n2  * nl,  and  we  have 

n2  ■ nl  <>M1(t-,pl)  iMi(t+,pl)  » Mi(t+,p2)  - a ,CA.6) 
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where  the  inequalities  follow  from  the  induction  hypothesis  and  from 
a)  respectively.  For  the  other  transitions  we  have 

T12,  T22  and  T22  imply  £ = pi  = p2,  n2  * m (see  II. 1.1  and  II .1.4 
in  Table  3 ) . 

T21  implies  p2  j*  nil,  and  then  the  counter  number  of  the  last 
message  received  from  any  neighbor  before  t+  is 
nl  = n2  = m. 

T32  implies  p2  j*  pi,  p2  f nil  and  then  from  steps  11.7.4,  II. 7. 5 
II. 7.1  in  Table  3,  n2  * mx^t-),  p2  * k*,  M^t+.k*)  » 

N (k*)(t-)  = mx^t-) . 

The  ne*c  lemma  shows  what  are  messages  that  can  travel  on  a line  after 
a failure  or  after  a message  with  d * «». 

Lemma  A. 3 

a)  If 

[l,tl,MSG(ml,dl,£) ] , [i ,t2,MSG(m2,d2,£)  ] (A. 7a) 

where  t2  > tl,  dl  = 00 , then  m2  > ml. 

b)  If 

[i,tl,FAIL(£)],  [i,t2,MSG(m2,d2,£)]  (A. 7b) 

where  t2  > tl,  then  m2  > n^tl)  and  also  m2  > n^tl).  • 

Proof 

a)  3^3  < tl  such  that 

T$3[M3,SEND(ml,dl,i),($,n2)]  (A. 8) 

and  from  property  R2  we  have  ml  ■ n2.  The  next  transition  of  1 must 
be 

T32(l,(n2,n3)],  n3  > n2 

so  that  by  Lemma  A. 2 b)  and  R2,  node  £ will  never  send  after  t3  any 
message  MSO(m,d)  with  m <_  ml.  FIFO  at  node  1 completes  the  proof. 
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b)  After  failure,  a link  (i ,£)  can  be  brought  up  only  with  numbers 
strictly  higher  than  z^(£)  as  defined  in  step  1.4  of  Table  3. 

Since  is  non-decreasing  by  Lemma  A. 2.  b),  the  proof  is  complete. 

Lemma  A. 4 

If  F (l)(t)  = READY  and 

[t ,i ,MSG(m,d,i) ] (A. 9) 

» 

then  m > z^(£)(t).  Observe  that  this  is  Fact  1.3.1  in  Table  3. 

Proof 

From  steps  1. 1-1.4  in  Table  3 and  property  2.7*7  inSec.  2*7*  F^(£)  can 

go  to  READY  only  from  DOWN  and  only  when  successful  synchronization  of 

WAKE ( X. ) occurs  at  i.  Let  tl  < t be  the  time  this  occurs.  By  property 

2.7.7,  at  time  tl  there  are  no  outstanding  messages  on  (i,£)  or  (A,i) 

and  z1(t)  is  established  as  maxtn^n^}  (see  1.4  in  Table  3) * Therefore 

the  message  in  (A.9)  must  have  been  sent  at  time  t2  > tl  and  since  i 

sends  messages  only  to  nodes  k for  which  F^(k)  ■ UP  it  follows  that 

F (i)(t2+)  * UP.  But  F.  (i)  could  have  gone  to  UP  from  READY  only  becauso 

of  II. 1.5.  II. 2. 5,  II.4.23  11.6.2,  II. 7. 7,  11.8.2  or  XX.9*2  in  Table  3,  and 

not  because  of  1.3  and  in  all  the  above  ve  have  n^  > *^(1)  •»  z^(A).  Since 

n is  noudecreaaing  and  i sends  MSG(m,d)  only  with  m ■ n,,  the  asaer** 
i * 

tion  follows. 


Lem  A. 5 
If 


T*2[tl,i, (*,»))  , 


(A. 10) 


then  ^t  > tl*.  we  have  Vk  s.t.  F^kHt)  - READY  that  t^kKt)  >,». 
Therefore,  no  link  oan  be  brought  up  by  node  i with  number  m after  the 
node  entered  82[m]  (brought  up  means  ?^(k)  + UP)* 


i, 
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Proof 

If  we  have  F.(k)(tl-)  * READY  and  z (k)(tl-)  < m,  then  at  time 
i i 

tl,  we  have  F^(k)  •*-  UP.  If  it  is  not,  then  > tl,  we  have  n (t)  >_  m 
by  Lemma  A. 2,  so  that  only  for  nodes  k with  z^(k)  j>  m it  can  happen  that 
Fi(k)  -*•  READY  after  tl. 

The  next  theorem  completes  the  proof  of  Theorems  1,  2 and  3. 

Theorem  A.l 

Let  PC(m),  PC(m)  be  the  instants  of  occurrence  of  two  successive 
proper  completions . Then 

a)  Theorem  3. 

b)  Consider  any  number  ml  <_  m.  Let  m be  the  highest  number  m <_  ml 
such  that  PC(m)  occurs.  Let  LPC(m,ml)  be  the  time  of  occurrence 
of  the  last  PC(ra)  such  that  PC(m)  <_  PC(m) . If  for  any  i,k, 

t <_PC(m),  we  have  either 

N^(k)  (t)  = ml  = m,  P^kHt)  t s^t)  ^ S3,  n^t)  ■ m (A. 11a) 

or 

H (k)(t)  =»  ml  > m , (A. lib) 

then  3^1  c [LPC(m,ml ) ,t ) and  \2  e (-r^.t)  such  that 

[Tl,k,SEND(ml,dl,i)3  (A,12a) 


(t2,i ,MSG(ml,d2,k) ] (A. 12b) 

with  dl  ■ D (k)(t)  - d (t2)  , d2  =*  D.  (k) ( t ) . 

i xiC  i- 

(Note : In  words,  trie  above  insures  that  the  message  (ml,dl)  was 

sent  and  received  no  earlier  than  LPC(m,ml)). 

a — 

c)  Consider  any  number  ml  <_  m.  Let  m be  the  highest  number  m <_  ml 
such  that  PC(m)  occurs.  Let  LPC(m,ml)  be  the  time  of  occurrence 
of  the  last  PC(m)  such  that  PC(m)  <_PC(m).  Then 
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i)  [tl,i,MSG(ml,dl»l) ],  Ct2,i,MSG(m2,d2,i)]  where 

LPC(m,ml)  <_  tl  < t2  <_  PC(m)  and  d2  5*  ® imply  m2  > ml, 


ii)  If 


T2l[tl,i, (nl.nl)] 
[t2,i,MSG(m,d,A) 1,  d^® 


(A. 13) 


(A.lU) 


where  LPC(m,nl)  <_  tl  < t2  <_  PC(m) , then  m > nl. 

iii)  A node  i enters  . between  LPC(m,m)  and  PC(m),  each  of 
the  following  sets  of  states  at  most  once 


{Sl[mj},  {S2[m],  S2[m]},  {S3(m]}  . 

d)  All  "Facts"  in  Table  3 are  correct. 

e)  i)  The  possible  transitions  at  a node  are  the  following,  where 

n2  _>  nl  and  n3  > nl:  T12[(nl,n2)  ] , T13[(nl,n2)  ] , T21[ (nl.nl)  3 , 

T22[(nl,n3) ] , T22[ (nl.nl) ] , T23[(nl,n2) ],  T23[(nl,n2) ], 
T32[(nl,n3)]>  T22[(nl,n3) ] . 

ii)  T2l[t,i, (nl.nl) ] , p^(t)  = i implies  3R(t)  * Sl[nl]. 

f)  Theorem  1. 

ft ) i)  Suppose  T2l[t,i, (nl.nl)  ] with  nl  ■ a and  let  tl  be  the 

last  time  before  t such  that  T$2[t1,1,( t,nl) ].  Then  we 
have  F^(k)(rl)  * UP  if  and  only  if  F^(k)(T)  * UP, 

Vt  c [rl,t]  . 

ii)  If  for  some  t e [PC(m),  PC(m) ] we  have 

T$2[t,i , ( $,n2)  ] , n2  * m , (A. 15) 

then 

3tJ.  e (t.PC(m))  s.t.  T2l[-rl,i,(n2,n2)  ] 

and 

^t2  e (t,PC(m))  s.t.  T23[i2,i]  or  T22[t2,i.l. 

(A. 16) 


w*..  fc 
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h)  If  *]i  ,k , t e (PC(m),PC(m)  ) such  that  for  some  Te(PC(m),t]  we 
have 

[t ,k,SEND{m,d,i) ] » d # * 

and  if  i either  has  not  received  this  message  hy  time  t or  has 
N.('k)(t)  ~ m,  Di(k)(t)  t «,  then  . tl e [t ,PC(m) ] such  that 

s^tl)  = S2[m]  or_  s^tl)  * S3[mJ  . (A. 17) 

Proof 

As  said  before t the  proof  proceeds  using  a two-level  induction . We 
first  notice  that  a)  holds  at  the  time  the  network  come3  up  for  the  first 
time.  We  call  this  PC(0).  Then  we  assume  that  a)-h)  hold  at  every  time 
up  and  including  PC(m).  Next  we  prove  that  b)-h)  hold  until  PC(m)  and 
then  show  that  a)  holds  at  PC(m). 

h)  Observe  that  from  Lemma.  A. 2 b)  and  Property  R8,  by  time  LPC(m,ml) 
no  node  in  the  network  has  ever  heard  of  a number  > m.  Therefore 
if  (A. lib)  holds,  an  appropriate  message  must  have  been  sent  and 
received  after  LPC(m,ml)  and  hence  (A. 12)  holds. 

On  the  other  hand,  observe  that  (A. 11a)  and  Property  R3  imply 
that  si(LPC(m,ml) ) t S3[m] . Also  note  that  the  induction  hypo- 
thesis assumes  that  a),  namely  Theorem  3,  holds  at  time  LPC(m,ml) 
and  therefore  at  this  time,  first , no  message  MSG(m,d)  with  d J8  • 
is  on  its  way  to  i and  second,  it  cannot  happen  that  {^(k)  » m, 
(it)  i «}.  But  (A. 11a)  says  that  the  latter  occurs  at  time  t 
and  therefore,  by  step  1.3  in  Table  3,  i mns£  have  received 
a message  MSG(m,d)  with  d + - after  LPC(m)  and  hence  (A. 12b). 
Since  no  such  message  was  on  its  way  to  i at  LPC(m),  A(l2a)  holds 
also . 
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c)  Suppose  c)  i),  ii)  and  iii)  are  true  for  all  nodes  in  the  network 
up  to  time  t-  . We  prove  c)  i)  and  c)  ii)  for  t.2  *=  t and  then 
prove  c)  iii)  for  t. 

i)  If  di  = °°,  then  m2  > ml  from  Lemma  A. 3.  It  remains  to  prove 
the  assertion  for  dl  < ».  From  Lemma  A. 2,  we  have  m2  >_  ml. 
Suppose  dl  ^ • and  m2  = ml.  Then  Lemmas  A. 3a)  and  A. 2a) 
respectively,  imply  that  ^ t3 e (tl.t)  such  that 
[i,t3,MSG(d3  = »,£)]  or  such  that  [i ,t3,MSG(m3»d3t£) ] , 
m3  5*  m2  = ml.  Therefore  the  two  messages  received  at  tl  and 
t2  » t,  can  be  taken  as  consecutive.  So  using  h), 

3 th  e [LPC(m,ml)  ,tl) , t5e(t^,t)  such  that 


Txy  [ 1 4 , £ , SEND  ( ml , dl , i ) ] , 

dl  # • , 

(A.18) 

Ta0[t5,A,SEND(ml,d2,i)], 

d2  t ~ . 

(A. 19) 

Txy  = T21  or  T12  or  T32 

«i 

or  T22  or  T22 

and  same 

for  Ta8.  But  hy  induction  hypothesis  on  c)  iii),  node  £ cannot 
enter  the  set  of  states  {S2[ml],  S2[ml]}  twice  between  LPC(m,ml) 
and  t,  so  that  the  only  possibilities  are 
{T12[t4,£]  OR  T32[tU,i]  OR  T22(t4,£]  OR  T22[t4,£]}  AND 
{T2l[t5,£]}  and  no  other  transition  happens  "between  t4  and  t5. 
But  in  T$2[t4,£]  » node  £ sends  a message  to  every  neighbor 
except  p (t4+)  and  in  T21[t5,i]  it  sends  a message  only  to 

Xr 

p (t5-)  and  since  no  other  transition  happens  between  t4  and 

At 

t5  we  have  p (t4+)  » P.(ty-).  This  contradicts  (A.lS),  (A. 19). 

At  J* 


ii)  If  F (i)(tl-)  = DOWN  or  READY,  then  Lemma  A. 4 together  vith  the 
facts  that  n^  is  nondecreasing  (by  Lemma  A. 2b)  and  that  z^(l) 
is  established  as  in  step  1.4  of  Table  3 show  that  the  first 
message  MSG(ml,dl,A)  that  can  be  received  by  i from  £ after 
tl  must  have  ml  > nl.  Then  the  assertion  follows  from  Lama  A. 2a). 
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If  F^JlHtl-)  » UP,  then  step  II. 3.1  in  Table  3 requires 

NjUMtl-)  - nl  (A. 20) 

and  by  the  definition  of  LPC(m,nl)  we  have  nl  _>  m. 

If  D^(t)(tl-)  = then  t3  < tl  (possibly  t3  < LpC(«,nl)) 
such  that 

f t3 , i ,MSG( nl , dl , 2. ) ) , dl  - - , (A. 21) 

which  together  with  (A.lU)  implies  by  Lemma  A. 3a)  that 
m > nl. 

If  Di(.t)(tl-)  £ «,  then  from  b)  follows  3^3  ejLPC(m,nl)  ,tl)  , 
such  that 

[t3,i,MSG(nl,dl,£.)]f  dl  < - (A. 22) 

and  the  assertion  follows  from  c)  i). 

iii)  From  Lemma  A. 2,  n^  is  nondecreasing,  so  that  once  nj  is 
increased,  it  cannot  return  to  the  old  value. 

From  the  algorithm,  a node  can  leave  {S2[m],  S2 [m] } and  not 
change  n^  * m only  via  T21  or  T23  or  T23«  If  T23  or  T23, 
then  R3  shows  that  it  will  strictly  increase  n^  when  leaving 
S3[m] . If  T?l[(m,m)},  then  c)  ii)  shows  that  it  cannot  subse- 
quently receive  a message  MSG(m,d)  with  d ^ *»,  and  in  order  to 
enter  S2[m],  such  a message  must  be  received.  Therefore,  the 
statement  holds  for  {S2[m],  S2[m]}. 

To  Sl[n>]  one  enters  only  from  S2[m]»  so  that  a node  cannot 
enter  SlTm]  twice  unless  It  enters  {S2[ml,  S2[m]>  twice,  so 
thax  the  s ;atement  holds  for  Sl[mJ. 

If  a node  enters  S3[m],  by  R3  it  leaves  It  only  with  a higher 
n^ , so  that  it  cannot  come  back  with  tne  same  n^ . 


t 
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d)  The  Fact  in  1.3  was  proved  in  Lemma  A. 4.  The  Fact  in  1.4  follovn  from 
property  2.7.7  in  Sec.  2.7.  Next,  observe  from  II. 2. 3,  II. 2. 7.  11.6.3 
and  11,9.3  in  Table  3 that 

T4>3[i,(dl,d2),(pl,p2)]  (A. 23) 

implies  d2  » ®,  p2  ■ nil,  so  Fact  32  is  correct.  Facts  13,  12,  23 
and  23  follow  from  Lemma  A. 2a)  and  A.2c),  since  if  MSG  is  received  at 
i at  time  t and  T13  or  T12  or  T23  or  T23  happen,  then 

m = number  received  by  i at  t on  p^(t-)  >_  M^(t-,p^(t-) ) . 

(A.24) 

Fact  21  is  correct,  since  if  T<J>2[i,(dl»d2)  ] , then  d2  < •»  and 
since  p^  * nil  iff  s^  = S3. 

e)  i)  The  ass-srtion  follows  immediately  from  Lemma  A. 2 b)  and  from 

checking  changes  on  in  Table  3. 

ii)  Recall  that  we  are  always  considering  times  until  PC(m)„ 

Observe  from  II. 3.1  in  Table  3 that 

T2l[t,i,(nl,nl) ] (A. 25) 

implies  that  N^tHt-)  » nl  for  all  i with  F ^(i)  « UP,  and 
since  from  II. 3.7  in  Table  3 Pk(t)  ■ i implies  F^(k)  » UP, 
we  have  N^(k)(t-)  * nl.  Note  further  that  D^(k)(t-)  + », 
since  otherwise  k was  some  time  before  t in  S3[nl]  and 
could  set  i only  if  i sent  to  k a message  MSG  with 

number  strictly  higher  than  nl.  But  N^(k)(t-)  * nl, 

D^(k)(t-)  » implies  from  b)  that  3te  [LPC(m,nl) ,t)  such 

that 

Txy[t  ,k,SENl)(nl,dfi)  ] , d j*  - . (A.26) 

Now  if  P^Ct-)  i*  i,  then  Txy  ■ T12,  but  in  order  for 
p^Ct)  « i PjJt**),  k must,  have  performed  T2l[Tl,k]  at  some 
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ti  e ( t , t ) . On  the  other  hand,  if  p (r-)  a i,  then  Txy  “ T21. 
Therefore  k performed 

T21[n,k,(nl,nl) ,(pl,p2) J,  p2  = i (A. 27) 

at  some  time  ne  [LPC(ra.nl) , t) . So  sk(n+)  - Sl[nlj. 

From  e)  i).,  the  fact  that  until  t node  k receives  no  number 

higher  than  nl  and  p (t)  = i,  one  can  easily  see  that  k 
remains  in  Sl[nl]  until  time  t. 

f)  We  refer  to  the  properties  to  be  proven  here  as  tree  properties.  If 

p a k,  ve  say  that  i is  a predecessor  of  k and  k the  successor 

of  i.  Also,  we  look  at  the  concatenation  and  write 

^ni’Si^  — ni  ~ \ a^d  if  n^  * implies  s^,  ^ s^. 

Using  this  notation  observe  from  e)  i),  that 

Txy[i,(nl,n2) ] 

j implies  (n2,y)  > (nl,x)  except  when  Txy  * T21. 

i 

As  before,  we  prove  the  tree  properties  by  induction,  assuming  that 
they  hold  up  to  time  t-  and  shoving  that  any  possible  change  at 
time  t preserves  the  properties.  The  changes  of  interest  here  are 
in  the  quantities  n^,  s^,  p^,  d^ . 

Let  us  consider  all  possible  transitions: 

i 

| T22[t ,i] ; only  s changes,  s,  (t+)  - s.(t-),  so  "trees"  properties 

j 111 

, are  preserved. 

Tl3[t,i]  * T23(t,i] , T23[t,iJ ; then  p^t+J  - nil,  so  no  successor  at  t+. 

Also  by  Lemma  A. 2 and  induction  hypothesis  follows  that  if  p^Ct)  * i» 
then 

(ni,si)(t+)  > (ni,«i)(t-)  >.  (n^.s^Xt)  , (A. 28) 

so  properties  are  preserved  for  all  predecessors. 
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T12[t,i] , T22[t,i],  T22[t,i.l  (change  s^  and  possibly  a^;  no 

change  in  p^) . Regarding  predecessors,  the  proof  evolves  rs  for 

T13.  Regarding  p , we  see  that 

1 

Txy [t,i,MSG(ra,d,i) , (nl ,n2) , (pi, pi) ] , (A. 29) 

where  Txy  - T12  or  T22  or  T22,  implies  from  steps  XI. 1.1,  II. 4.1, 

11. 8.1  in  Table  3 that  i m pi,  d j4  » and  from  steps  II.1.4,  II. 4. 2, 

11. 8. 2 that  m ■ n2.  From  b)  and  R2,  this  implies  that 

3t  e [IJPC(m,a)  ,t)  such  that  a^Cx)  " S2[n2] . Now,  if  on  (r,t),  pi 
stayed  in  S2[n2j  or  performed  any  transition  except  T21[pl, (n2,n2) J , 
then  T12[i]  or  T22[i]  or  T22[i]  preserve  the  tre*  properties. 

We  want  to  show  by  contradiction  that  pi  could  not  have  performed 
T21  on  (t, t) . Suppose 

T21[rl,pl, (n2,n2) J , x < xl  < t , (A. 30) 

then  by  step  II. 3.1  of  Table  3 we  have  Npl(i)(xl)  “ n2.  Now  we 
distinguish  between  two  cases: 

If  D _ (xl)  *>,  then  by  b),  ^t2s  (LPC(m,n2),xl)  such  that 
pl  • t 

Ix2,i,SEND(u2,d,pl)j  , d / - (A.31) 

which  by  R2  implies  that  8^(12-)  * S2[n2]  or  8^*2+)  ■ S2[n2]. 

But  T12[t,i, (nl,n2) ] or  T22[t,i, (nl,n2)J  or  T22[t,i,  (nl,n2)  J 

says  that  i enters  S2[n2]  at  time  t which  contradicts  c)  iii) . 

If  D ,(i)(xl)  - -,  then  for  some  x2  < xl  (not  necessarily 
pi 

x2  > LFC(m,n2) ) 

[x2,i,SEND(n2,d,pl) J,  d * " 

which  implies  that  s^(x2+)  • 83[n2].  But  s^(t+)  " S2[n2]  and 
x2  < t,  which  is  impossible  by  R3  and  Lemma  A. 2. 
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i 


T32[t,i ,(nl,n2) ,(nil,pl) ] . Regarding  predecessors  the  tree  proper- 
ties are  preserved  3ince  n2  > nl.  Regarding  successor,  the 
above  implies  that  Jte  (LPC(m,n2) , t) 

[t,pl,SEND(n2,d,i)]  . 


Now,  from  Lemma  A. 2,  n,(t)>n_(r).  From  R2,  n , (t)  =*  n2. 

pi  — pi  pi 

Now,  if  n^^t)  > n2»  then 

(n  .s  )(t)  > (ni,si)(t+)  . 

If  on  the  other  hand  n^(  t-)  = n2,  then  the  same  argument  as 
for  T12,  T22  shows  that  pi  was  in  S2[n]  sometime  before  t ; 
and  could  not  return  to  Sl[n2]  in  the  meantime,  so  that 


(npl»spl)(t)  _>  (n^s^Ut)  . 

In  addition  to  the  above,  since  here  there  is  a change  in 
from  nil  to  #nil,  we  have  to  check  that  this  change  does  not 
close  a loop.  This  is  seen  from  the  fact  that  every  node  k 
upstream  from  i at  time  t has 

(n^.s^Kt)  <_  (ni,si)(t-)  ■ (nl93)  < (n2,2)  « (ni,si)(t+) 

and  every  node  4 downstream  from  pi  has 
(n£,s£)(t)  >_  ^npl»8plHt)  > (n2,2) 

T2l[t,iinl,nl),(pl,p2),(dl,dl)].  If  pk(t)  = i,  then  from  e)  ii) 

follows  that  s (t)  ■ Sl[nl],  so 
& 


(ni*8i)^t+)  * 


Regarding  successor,  steps  II. 3.1  and  II. 3. 7 o£  Table  3 show  that 
N (p2)(t-)  - nl,  Di(p2)(t-)  i •»,  so  that  from  b) , 3 re  [LPC(ml>m) , t) 
such  that 
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[T,p2,SEMD<m,d,i)] 

with  o * nl  = np2(x+),  d = d^T-O  a D1(p2)(t-)  - d1(ip2(T). 
Therefore  from  Lemma  A. 2, 

(np2,sp2)(t)  j>  (nl,l)  * (n1,ei)(t+)  . 

Now  suppose  that  the  change  in  pi  closes  & loop  at  t+. 
Then  the  last  expression  and  the  induction  hypothesis  show  that 
at  time  t+ 

( VV  - ‘v*’ 

for  all  nodes  t around  the  loop,  so  that  (n,s)  must  be  con- 
stant around  the  loop,  namely 

(n,s)  = { nl,l) 


around  the  loop.  Therefore  3 _(t)  - Sl[nl].  ^But  by  R2,  8 (T-)  » 

p<:  pZ 

a „(t+)  ■ S2[nl]  where  t is  defined  above,  so  by  c)  iii),  node 
P2 

p2  could  not  enter  again  S2[nl]  between  and  t,  bo 


V(t)  ’ 4p2<T+)  * Di(p2Kt"1  ' dl,p2(t) 

But  from  steps  II. .3. 2 and  II >3 .7  of  Table  3 

dl  > Di(p2)(t-)  - d (t)  + di>p2CT) 
which  from  Assumption  2.7.2  implies  that 


dl  - d,(t+>  > d _{t)  . 
i p2 

On  the  other  hand,  the  Induction  hypothesis  implies  that  since 
(n^s^)  = (nl,l)  around  the  loop,  ve  have 

d,(t)  >.  (t) 

i - P* 

for  all  A j*  i around  the  loop  and  this  provides  a contradiction, 
therefore  no  loop  is  closed  by  the  change  in  p^. 
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g)  i)  During  (tl.t),  no  link  is  brought  up  by  i because  of  Lemma  A.k . 
If  there  are  failures,  let  r3  be  the  first  time  on  (xltt) 
such  that 

[t3, i, FAIL(k) ] . 

Then  T23[T3,i, (nl,nl) ] or  T22 [t3,±, (nl,nl) J happen  with  nl  • a. 
In  either  case,  e)  i)  shows  that  to  exit  S3[nl]  or  S2[nlJ,  one 
has  to  increase  n so  that  it  is  not  possible  that 
T21[t,i, (nl.nl)]  . 

So  no  failure  can  occur.. 

ii)  Consider  the  sequences  of  nodes  and  instants 


such  that 


i = i ,i  ,i  , ...,i  - SINK 

0 12  s 


t = t > t > t_  > ...  > t 
0 12  s 


T*2[t  ti  ,U,n2)  ,(pl  ,p2  )] 

w u U U 


where  n2  =»  o and  p2u  = iu+^.  There  must  have  existed  such 
sequences  if  T*2[iQ].  Suppose c [to»PC(£)J  such  that 


T21 [ t , i , ( n2 , n2 ) ] . 


We  want  to  show  that 


^Tl  c [tx , 


PC(m)]  such  that 


T21[Tl#ilP(n2,n2)]  . 


If  there  existed  such  a xl,  it  follows  from  g)  i)  that 
F.  (i  ) (xl)  - UP. 

h ° 

We  vault  to  show  now 

[r2,io,SEND(n2Id,i1)],  d * - , 

and  3t  (PC(m),xl)  such  that  such  a massage  with  4 <f>  • is 
sent.  For  t2  < tQ  ^ this  follows  respectively  from  R2„  R3  and 


that 


2 < ti  such  that 
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R2,  o)  iii).  For  t2  « t , it  follows  from  the  fact  that 

o 

pi  (V>  ■ h ■ 

0 

For  t2 e ( to,PC(m) ) , the  only  possibilities  for  i if  T21  does 

Si  ' 

not  happen,  are  to  stay  in  S2[m]  or  T22[(n2,n2) ],  or 
T23[(n2,n2) ] , or  T23[ (n2,n2) } . In  all  cases  iQ  will  not  send 
any  message  to  i^.  ‘ 

The  above  show  that  N (i  )(t!-)  & m * n2  so  that 

1 o 
1 


T21[Ti,i  ,(n2,n2)] 


is  impossible.  Repeating  the  proof,  it  follows  that 
that 

T2l[r  SlNK,(n2,n2)],  n2  * m , 
s 


A 


such 


which  contradicts  the  assumption  tha£  there-' is  a proper  comple- 
tion at  time  PC(m) . This  proves  the  first  part  of  g)  ii).  The 
second  part  follows  because  T21[t1,1 ,(n2,n2) ] , a » m is  not 
possible  if  T23(i , (n2,n2) ] or  T22[i,(n2,n2) 3 happen. 

h)  If  [T,k,SEND(m,d  A“,i)  ],  then  F^(i)(^)  - UP  and  by  R2  either 
Tx2[r,k,(^,n2) j,  n2  * m,  X * 1,2,3 


or 


T2l[Tl,k,(n2,u2) ],  n2  ® o . 


If  Tx2  then  g)  ii)  implies  (r,PC(m))  such  that 

T2l[T2,k,(n2,n2) j , n2  * m 

and  F^dKrl)  - UP.  Therefore  T21  happens  at  node  k at  some  time 
(xl  or  t2) < Call  this  time  n.  We  have  then  ( i ) ( n ) “a.  By 
b)  either  3t3c  [PC(ia)  ,n)  such  that 


] 
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[T3,i,S£ND(5,d  j^-.k)  j 
• or  3«<  n such  that 

[tU,1  ,SEITD(m,d  = ca»k)  ] . 

> But  by  R2„  this  means  that  J.  is  at  some  time  before  n in  S3[m] 

or  is  at  some  time  between  PC(m)  and  PC(m)  in  S2[m].  If  the 
first  holds,  node  i vill  stay  in  S3[m]  at  least  until  PC(m) . 

If  the  latter  holds,  then  by  g)  ii)  it  must  perform  T21[i,(n2,n2) ] 
before  PC(m).  But  since  it  still  has  Nj,(k)(t)  » m,  D (k)(t)  t • 
or  has  not  received  yet  the  message  by  time  t,  property  c)  i) 
implies  that  node  i could  not  perform  T2l[i,(n2»n2) ] before 
time  t.  Therefore  it  vill  perform  later,  so  q.e.d. 

'n* 

Proof  that  a)  holds  at  time  PC(m) 

i)  Node  i cannot  be  in  S2[m]  because  of  g)  ii)  and  c)  iii).  It 
cannot  be  in  S2[m]  because  it  must  have  been  In  62  [m]  before 
and  because  of  g)  ii). 

ii)  Take  t ■ PC(m)  in  h).  Then  h)  says  that 

a^(PC(m))  = S2[m]  or  S3[m]. 

But  g)  ii)  and  c)  iii)  imply  that  s^(PC(m))  j4  S2[m]. 

0 iii)  Follows  by  contradiction,  because  if  ve  had 

N1(k)(FC(S) ) - £,  D1(k)(PC(m))  j4  - , 

it  follows  by  taking  t - PC(m)  in  h)  that 
s^(PC(m))  “ S2[m)  or  S3[m] 

This  completes  the  proof  of  Theorem  A.l. 
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Appendix  B 

In  Appendix  A we  have  proved  Theorems  1,  2 and  3.  Thi3  appendix  is 

devoted  to  proofs  of  the  remaining  statements,  namely  Theorem  U (normal 

activity)  and  Propositions  1 and  2 that  lead  to  the  recovery  theorem.  Theorem  5- 

The  proofs  are  organized  as  follows:  Ltma  B.O  i3  preliminary  and  shows 

that  on  any  link  (i,A)  the  only  two  "stable"  situations  are 

(FiU)  = P£(i)  * DOWN}  or  <P±(£)  DOWN,  F^i)  jt  DOWN}.  Lemmas  B.l  and  B.2 

prove  Proposition  1,  Lemma  B.3  proves  Theorem  U,  and  the  Proposition  2 is 

proved  by  the  series  of  four  lemmas  B.U  to  3.7. 

• 

Lemma  B.O 

If  F (£)(tl)  ■ DOWN,  F (i)(tl)  Jt  DOWN,  then  in  finite  time  after  tl 

we  have  either  F^A)  » F^i)  = DOWN  or  {F^l)  ? DOWN  and  F^( i ) ^ DOWN}. 

Proof 

If  F^iKtl)  » READY,  then  I and  £ arrived  to  this  situation 

from  {F  (i)  * F^A)  * DOWN}  or  (F£(i)  ■ FjU)  « READY}  or 
(F£(i)  m READY,  F^(A)  3 UP}.  Then  assumptions  2.7.9  imply  the  assertion. 

If  F£(i)(tl)  ■ UP,  then  i and  A arrived  to  this  situation  from 

(F£(i)  * READY,  F^A)  - DOWN}  or  (F£(i)  * F1(A)  - UP},  or 
(F^i)  » UP,  F^(t)  * READY}.  In  the  first  case,  the  discussion  reduces  to 
the  first  part  of  the  proof,  whereas  for  the  second  and  third  case,  asser- 
tion 2.7*9  a)  in  Sec.  2.7  proves  the  assertion. 

Lemma  B.l 

Proposition  1(a). 

Proof 

Clearly,  n^(tl-)  < m2  for  all  i.  Therefore  (10)  may  happen  only 


at  or  after  tl. 
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Let 

A(t)  = {i:  ieL(t)  and  i effected  (10)  vith  t2±  < t}  . 

If  'J.  t2  such  that  A(t2)  =»  L(t2) , then  the  proof  is  complete.  Otherwise, 
for  a given  t3,  we  will  show  (by  contradiction)  that  3t»  t3  < t < - 
such  that 

A(t)I>A(t3)  and  A(t)  + A(t3)  . (B.l) 

Hence  by  induction,  the  set  A(t)  keeps  growing  until  it  equals  L(t). 

Since  there  are  no  pertinent  topological  changes  and  i ieA(t) 
have  ni(t)  = m2,  property  RIO  implies  that  the  set  A(t)  is  nondecreasing 
as  t increases.  Therefore  to  prove  part  i)  of  Proposition  1(a)  it  is 
sufficient  to  show  that  the  following  cannot  hold: 

\jt  > t3,  kit)  » A(t3)  * L(t)  (B.2) 

'X 

Let 

B(t)  »*  (ijieL(t)  and  i^A(t)}  , 

i 

* A'(t)  = { 1 1 i e A( t ) and  i has  a potentially  working  link  to  a node  of  B(t)>, 

B ’ ( t ) *»  {i|ieB(t}  and  i has  a potentially  working  link  to  a node  of  A(t)}. 

The  following  three  claims  will  ^contradict  (B.2). 

Claim  I 

If  (B.2)  holds,  then  zltk  c (t3,-)  such  that  V J e B' (tU) , ^ tU  < tU 

J 

such  that  ( tU  , j ,MSG(m2) ] , (i.e.  all  nodes  of  B'(tU)  receive  m2  in 

finite  time) . 

Proof  of  Claim  1 

At  time  t2^  < t3,  node  i f A'(t2i)  performs  transition  (10).  Now 
observe  that  since  no  pertinent  topological  changes  occur,  property  RIO 
implies  that  for  all  t,  F^(i)  c-aano  . be  changed  from  or  to  DOWN  after  t2^  . 
Therefore  if  ?1(i)(t2  -)  - DOWN  then  F^iKt)  - DOWN  for  t > t2j  and 


r ■' 
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if  F±-(l)(t2  -)  “ BEADY  or  UP,  then  F^lXt)  - UP  for  t > t2^  (see  IX. 1.5, 
II. 4. 2,  II. 7. 7,  II. 8. 2 in  Table  3)  . For  links  (i,l),  where  leA'(t2j), 
AeB'(t2^)  and  F^(i,)(t2^+)  ” UP,  observe  from  II. 1.6  in  Table  3 that  if 

pi^t2X  ^ then 

[t2i,i,SEND(m2,i)]  . 

Since  by  Lemma  A. 2c)  we  have 

Pi^t2i^  **  B^t2i^ 

and  since  property  2.7.9  Sec.  2.7  insures  that  the  above  message  will  arrive, 
there  is  a time  t4  for  which  all  nodes  j that  ware  in  B*(t2^)  for  some  i, 
either  are  not  in  0'(t4)  anymore  or  have  received  MSG(m2) . Also  observe 
that  B'  (f.4)  cannot  be  empty,  since  then  B.2  is  contradicted. 

Let  t5^  denote  the  time  at  which  j e3'(t4)  receive* MSG(m2,k) , where 
keA'(t4).  If  f3jeB'(t4)  such  that  p (t5  , ) “ k for  some  keA*(t4)  then 

J J** 

from  II. 1.1,  II. 4.1,  II. 8.1  in  Table  3,  the  transition  T$2[j, ($,m2)J  occurs, 
contradicting  (B.2),  q.e.d.  Otherwise, 

Claim  2 

If  J e B' (t4)  such  that  t k then  Yt  > t5jk,  Pj(t)  1*  k. 

Proof  of  Claim  2 
Suppose 

Txy[t,J,(pl,p2  » -k)  J,  t > t5jk  . 

If  x j*  3,  by  R5  Txy  - T13  or  T21  or  T23  or  T23, 

But  T23,  T13,  T23  — > p2  - nil  t k,  tharefore  this  cannot  happen. 

T21  — ^^/q,  N (q)(t)  ■ n < m2,  but  H.(k)(t)  ■ m2  , hence  T21 
J J J 

2annot  happen; 


If  x » 3 then  T32[t,  j,  MSG(nt2)  3 happens,  contradicting  (B.2),  q.e.d.  Claim  2. 
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Claim  3 


In  finite  time,  all  nodes  i e B(t4)  will  effect  T$3[i,(<|>»m)  3 , 
. m < ml  without  effecting  T3$  thereafer. 


Proof  of  Claim 


^ is  updated  In  T12,  T13,  T22,  T23  and  T32  only.  For  all 
ie£(t4),  T$2[i,($,m2) ] does  not  occur  because  of  (£.2),  and  T$3[i»(<*  m2)  ] 

does  not  occur  because  there  are  no  pertinent  topological  changes.  Hence, 

Vi  eB(c4)  and  ^7t  > t4,  n (t)  _<  ml  . 

Since  after  t4  no  update  cycles  with  m <_  ml  are  started  by  Theorem  2(ii), 
the  number  of  messages  with  d < “ generated  by  the  nodes  of  B(c4)  is  finite. 
Similarly,  since  the  number  of  arcs  is  finite,  the  number  of  messages 
FAIL  is  also  finite.  Consider  B(t 4)  after  all  these  messages  are 
generated  and  received.  Then  *j/ieB(t4),  T3$[i]  cannot  occur  and 

Txy[i,(pl,p2  / pi)]  implies  p2  » nil.  Then 

1.  if  VkeB(t4),  Pk  “ nll»  then  <l*e*d*  Claim  3; 

2.  otherwise,  after  a sufficiently  long  period  of  time  t , by  Claim  2 

mx 

and  Theorem  A^L,  there  exist  k and  i such  that: 

i,kcB(t3),  1 “ 1x11  • 

When  p.  was  set  to  nil,  Txy f i,SEND(m,d  » **,k)  ] occurs.  At  t 

l mx 

this  message  Is  not  yet  received  by  k/  because  p,  (t  ) * i.  After 

& mx 

this  message  is  received^  node  k effects  T$3,  enters  S3  and  does 
not  leave  it  anymore.  3y  induction,  q.e.d.  Claim  3. 


The  proof  of  Proposition  1(a)(1)  is  completed  as  follows.  Consldar  a 
node  JcB'(t4).  Define  t3j  to  be  the  time  at  which  T$3[t3j,J]  occurs  by 
Claim  3.  But 
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a 

i 

p 


i 


if 

t3j 

then 

if 

>t5j* 

then 

T32[t5jk,j]  happens, 
T32[t3j,,j]  occurs,  and 


«j  * *V 


which  contradicts  (E.2),  q.e.d. 

To  prove  part  (ii)  of  Proposition  1(a),  we  investigate  further  the 
situation  in  L(t2)  at  time  t2.  Observe  that  since  all  nodes  in  L(t2)  have 
n ■ m2,  and  no  pertinent  topological  changes  happen,  it  follows  from  RIO  and 
Lemma  B.O  that  for  any  link  (i,£)  such  that  ieL(t2),  it  L(.t2),  it  cannot 
happen  that  at  time  t2  we  have  F^(£)  - DOWN,  F^(.i)  1*  DOWN.  Also 
F (£)  * READY  is  not  possible,  because  lack  of  pertinent  topological  changes 
imply  that  F (iXt'2^-)  " READY  as  well  ( and  then  II. 1.5  in  Table  3 shows  that, 
for  example  F^(Z)(t2J+)  ■ UP  and  therefore  F^(z)(t2)  - UP.  Therefore,  for 
links  (i,£)  connecting  nodes  in  L(t2),  the  only  possibilities  at  time  t'2 
are  * F^d)  " DOWN},  (F^Ci)  - F^(i)  « UP},  hence  Proposition  1(a) (ii) 

is  proved. 


Nexr,  arsuming  Proposition  1(a)  which  was  proved  by  Lemma  B.l,  wa  now 
prove  Proposition  1(b) . 

Lemma  E.2 

Let  L(t)  be  as  in  Lemma  B.l,  and  suppose  that  a new  cycle 
T<(!2[tl,SIM,U,nil)  ] is  started.  Suppose  also  that  no  pertinent  topological 
changes  have  happened  before  tl  while  ngIIJK  ■ ml  and  that-  no  such  changes 
will  take  place  after  tl  for  a sufficiently  long  period  of  time.  Define 
t2^  to  be  the  smallest  time  t such  that 

T$2[t,i, ($,»!)],  t > tl 

occurs.  Suppose  also  there  exists  t2,  tl  ' t2  < * Buch  that  for  all 
ieL(t2) 

occurs  with  tl  < t2,  < t2,  and  t2  **  max  (t2. ) . 

“ 1 “ t2}«~ 
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i)  There  exists  & time  t3  < <*  such  that  t2  < t3  and  that 

T21[t3,SINK, (ml, ml) ] occurs; 

ii)  yte  [t2,t3],  we  have  H(t)  * L(t)  - H)ft2) ; 

iii)  RG(t3)  for  the  nodes  in  L(t3)  is  a single  tree  rooted  at  SINK. 


Proof 

We  prove  first  that  there  is  PC(ml)  after  tl,  then  we  show  that 
there  is  no  PC (ml)  between  tl  and  t2. 


Since  there,  are  no  pertinent  topological  changes,  after  entering  S2[ml] 
at  t2i  each  node  ieL(t2)  can  only  perform  transitions  between  states 
SI  and  S2.  Furthermore,  by  Theorem  l(i) , after  t2,  these  nodes  fora  a 
single  tree  rooted  at  SINK.  Consider  a time  t',  t'  > t2.  Since  there  are 
no  pertinent  topological  changes,  L(t')  * L(t2).  Also,  by  Theorem  2(iii), 
if  a node  iEL(t2)  enters  S2[mlJ  after  t2,  PC^ml)  has  occurred  after  tl. 

1.  If  Vi  e h(t 1 ) , s^(t')  * SI  then  there  exists  t3,  tl  < t3  < t* 
such  that  T2l[t3,SINK, (ml, ml) ] occurred; 


otherwise,  consider  a node  k such  that  s (t*)  » S2 

k J 

V)  if  Pj(*')  a k»  then  Sj(t')  » SI  (B.3) 

such  a node  k always  exists.  Classify  the  neighbors  of  k into: 


A » (i:  F.(k)(t:)  - UP  and  s.(t‘)  - Sl> 

A • 1 

B « (1:  F^kXt')  - UP  and  s^t’)  - S2}  .. 

At  some  time  in  the  interval  [tl,tf],  the  nodes  in  A have  sent 
messages  MSG(ml,d  j* »)  to  all  their  neighbors.  At  some  time  in  the 
same  interval,  thoee  in  B have  sent  such  messages  to  all  their 
neighbors  except  p^t').  Hence  by  (B.3) , k will  receive  massages 
NSG(ml,d  j*  •)  from  ell  its  neighbors,  et  e finite  time,  say  t4.  Then 
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2.1  if  3^(14+)  ® S2  means  that  1 with  F^(i)(t4)  - UP  such  that  Nlc(i)(t4) 
which  implies  that  T2l(k,(ml,ml)  ] occurred  in  the  interval  [tl.tU], 
hence  by  Theorem  2(iii),  PC (ml)  occurred  between  tl  and  t4; 

2.2  if  s (t4+)  a SI,  by  induction,  PC(ml)  will  occur  in  finite  time 

iv 

after  tl. 


We  show  next  that  PC(ml)  cannot  happen  in  [tl,t2j.  Suppose  that 
at  t5»  the  first  PC (ml)  after  tl  occurs.  Let  k be  a node  such  that 
t2^  < t5  and  keL{t2),  hence  since  there  are  no  pertinent  failures,  there 
exists  a J e L(t2)  such  that  F^(k)(t2^)  - F^(k)(t5)  - UP.  But  j sent  to  k 
a message  MSG(ml,d^“)  in  the  interval  [t2  ,t?j;  on  the  other  hand  by 

t? 

Theorem  3 such  a node  k does  not  exist. 

Since  there  are  no  pertinent  topological  changes,  we  have 
L(t2)  » L(t3),  and  according  to  Theorem  l(i)  these  nodes  have  preferred  links 
forming  a single  tree  rooted  at  SINK  and  hence  iii) . 

Finally,  looking  at  the  situation  in  the  network  at  time  t2  as  described 
in  Lemma  B.l,  and  for  all  te  [t2,t3],  we  observe  that  for  all  (i,£)  for  which 
FJU)(t2)  - UP  we  must  have  FjttXt)  - UP  and  if  F±(£)(t2)  - DOWN,  we  must 
have  F±(l)(t)  - DOWN.  This  completes  the  proof  of  ii) . 


Leaaua  B.3 


Theorem.  4 . 


Procf 


By  the  Algorithm,  a new  cycle  T12[tl,SINK,  (ml,ml)  ] can  start  only 


if  all  previous  cycles  with  the  same  counter  number  ml  were  properly  com- 
pitted.  Since  cycle  counter  numbers  are  non-decreasing,  the  first  cycle  with 
ml  was  started  at  a time , say  t ' , by 

T12[t ' ,SINE,  (mD, ml)  ] , ml  > mO  . 
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This  transition  satisfies  the  condition  of  Proposition  1.  Hence  in  a finite 
time,  say  t",  the  cycle  is  properly  completed,  L(t")  forma  a tree  rooted 
at  SINK,  all  ieL(t")  have  ^ * ml,  and  since  there  are  no  pertinent 
topological  changes,  for  all  t >_  t”: 

1.  H(t)  - L(t)  » L(t"j  q.e.d.  Theorem  4(b),  and 

2.  by  Theorem  l(i)  all  nodes  ieL(t)  form  a single  tree  rooted  at 
SINK,  q.e.d.  Theorem  4(d,ii). 

Define  to  be  the  set  of  nodes  that  are  on  the  tree  at  time  tl, 

at  a distance  of  k nodes  from  the  SINK.  A * SINK  and  it  is  assumed  by 

o . 

Theorem  4 chat  T12[t2gjN^  - tl,SINK, (ml, ml) ] occurs.  Suppose  all  i 
effect  T12[t2_^,i, (ml,  ml)  1 , seeding  messages  MSG(ml)  to  all  J C ^Scfl 
through  their  p (tl)  . But  since  there  are  no  pertinent  topological  changes 

J 

after  tl,  can  only  change  by  T21,  and  since  s^(tl)  " SI,  only  after 

T12.  Then,  all  j e will  receive  messages  MSG(ml)  at  a finite  time 

t2  from  p (t2  ),  which  trigger  the  occurrence  of  VT12(t2.  ,J , (ml,ml)  and 

J J J J 

by  induction  on  k,  q.e.d.  Theorem  4c). 

Theorem  follows  directly  from  Lemma  B.2  by  assuming  Theorem  4(c) 

Theorem  U(a)  follows  directly  from  the  algorithm  for  SINK.  This  completes 

the  proof. 

Proposition  2 will  be  proved  by  Lemmas  B.4  and  B.7.  When  »n  R2Q(ml) 
is  generated,  it  is  placed  in  the  queue  for  processing.  If,  vhen  the 
HEQ(al)  is  processed,  the  node  is  at  32,  SS  or  SI,  then  an  RSQ(ml)  is  sent 
by  this  node  to  its  current  preferred  link.  The  proof  of  Proposition  2 for 
these  cases  is  given  in  Lemma  B.5  (for  82  or  S2)  and  Lemma  B.7  (for  Si). 

Lemma  B.6  proves  the  proposition  for  the  case  vher«  there  is  a node  in  state 
S3[ml] . Lemma  B.U  is  used  to  simplify  proofs. 
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Lemma  B.k 

If  a REQ(ml)  is  generated,  then  either: 

1*  REQ(ml)  is  Processed  only  by  nodes  having  n±  -ml,  and  all  nodes 
j have  n^  <_  ml,  or 

2.  a REQ(mi)  arrived  at  SINK. 

Proof 

By  Theorem  l(ii)  and  by  the  Algorithm,  KEQ(ml)  i3  not  received 
(i.e.  processed)  by  a node  i with  n < ml.  On  the  other  hand,  if  there 
exists  a node  i with  n > ml,  • the  SINK  started  a cycle  with  m > ml; 
this  can  happen  only  following  the  arrival  of  REQ(ml)  to  SINK,  q.e.d. 


Lemma  B.| 


If  a node  i sends  REQ(ml)  while  = S2[ml]  or  S2[ml],  then 
a REQ(ml)  arrived  or  will  arrive  at  SINK  in  finite  time.’ 


Proof 


Consider  the  strings  of  nodes  and  instants 


i * i *ii » • • • »i  * SINK 

O JL  c.  2& 


such  that 


t > t,  > t.  > ...  > t 
0 12  m 


T^2[tu,iu,(^,n2),(plu,p2u)]  , 


where  n2  - 1,  p2^  * ^u+i‘  There  must  exist  such  a string  if  s^  - S2[ml] 
or  S2[ml],  The  string  has  no  loops,  otherwise  Lemma  B.4,  Theorems  1,  2 or  4 
w?ll  be  contradicted. 


Suppose  that  at  time  t2^,  a node  i^  sends  REQ(ml)  to  i . 

Suppose  also  that  in  the  interval  [tu,t2u3,  node  i effects  no  transition 

except  possibly  T22.  After  t . , , the  first  transition  executed  by  i , 

u+l  * u+1 


could  be 
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T22[i^_^];  q,.e.d.  by  Theorem  3 and  Leunaa  B.U. 

T22^u+1^’  in  whicb  cas®  a failure  is  detected  by  i and  REQ(ml) 
sent  to  i 

u+2 

T2l[iu_^];  this  transition  is  executed  only  after  receiving  a message 

from  iu-  Such  a message  is  sent  by  i^  vhen  T21[iu]  happens, 

i.e.  after  i^  has  3ent  REQ(ml).  Since  FIFO  is  preserved,  i^+^ 

will  receive  and  therefore  send  REQ(ml)  to  i _ before  T2lfi  1 

u+2  L u+1 

happens,  i.e.  while  s.  =»  S2. 

1u+l 

T23[i  ];  in  this  case  there  exist  i , r > i + 1 such  that  T2i[i  ] 

• ^ 

and  i sends  REQ(al)  to  i . 
t r+l 


Thus  by  induction^  REQ(ml) 


arrived  or  will  arrive  at  SINK  in  finite  tine. 


Lemma  B.6 

If  there  exists  a node  that  effects  T$3[($,ml)],  then  a REQ(ml) 
arrived  or  will  arrive  at  SINK  in  finite  tine. 


Proof 

Let  Pcj»  (j  ■ 0,1,2,...)  denote  the  J-th  occurrence  of  PCfmlJ. 
Given  a node  i and  a tine  t such  that  T$2[i,($,ml) ] has  occurred 
before  t,  if  PC^  is  the  last  PC [ml]  before  t after  which 
T$2[i,($,mi) ] occurred,  then  define  E^t)  ■ J + l. 

By  Lemma  B.U,  we  have  to  prove  only  the  case  in  which  n^  <_  ml  for 
all  i.  Thus,  if  a node  i is  in  state  S3[ml],  this  node  will  not  execute 
any  further  transitions. 
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Property 

Given  a time  t,  suppose  p^t)  = k and  a^t)  =»  n^t)  ■ ml,  then 

E1(t)  < Efc(t)  . 

1 • This  can  be  proved  as  follows: 

Suppose  that  prior  to  t and  after  PC^,  vas  la3t  set  to  be  k. 

This  can  be  done  only  by  T2l[i]  or  T32[i].  Since  at  PC  , 

CL 

? S2[ml]  (by  Theorem  3l  this  implies  that  T$2[il  occurred 

after  PC  and  T$2[i]  cannot  occur  again  before  t because  this 

8. 

will  set  again  . Hence  E^(t)  = a + 1 . The  occurrence  of  T2l[i ] 

or  T32[i]  implies  that  a message  from  k with  d < «•  arrived  at 

i after  PC  . By  Theorem  3,  this  message  was  sent  after  PC  , this 
a a 

being  possible  only  if  k effected  T$2[k]  after  PCft.  Since 
i is  non-decreasing  then  E^tt)  >_a+l, 

| Since  after  a node  effected  T$3[  ( 4>  ,ml)  ] the  same  node  cannot  per- 

form any  further  transitions,  only  a finite  number  of  transitions  T$3[($,ml)] 
can  he  executed  in  the  network.  If  T$3[($»ml) ] happens,  there  exists  a 
node  which  detects  a failure  in  its  best  link  and  executes  T^3[(ml,ml) ] . 
Define  B1  as  the  set  of  nodes  for  which  T$3[(ml,al)]  happens,  this  is 

B1  * (1:  T$3[t. ,i,(ml,ml) j happens}  . 

, | ■ 1 ■ 

Define  B2  as  the  subset  of  B1  for  which  T$3[ (ml,ml) ] happens  with  the 

highest  i.e. 

B2  - {J:  J e B1  and  (E,(t.)  ■ max  E. (t. ))  , 

J J ieA  1 1 
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Case  1:  Suppose  there  exists  i e B2  that  effects  T23[i,(ml,ml) ] . 

Let  max  E (t  ) = a + x.  Then  at  PC  , by  Theorem  3, 
ieA  11  a 

si  J4  S2[ml] . Thus  the  first  i e B2  that  effects  T23[ml,ml)]  has 

a path  tc  SINK  at  t^  (by  Theorem  1).  From  all  1 e B2  that 

effect  T23[t  ,i,(ml,ml) ] while  having  a path  to  SINK,  let  q 
x o 

denote  the  node  having  the  shortest  path.  Suppose  the  path  is 


I 


q,. 

a. 


(SINK 


qk+l' 


By  Theorem  1 all  qtQ  have  s (z  ) = S2[al],  But  can  only 

effect  T21  or  T22,  and  q cannot  effect  T21  unless  receiv- 
ing a message  from  qQ  which  cannot  be  sent  because  q does  not 
effect  T21.  Hence  q^^  will  detect  a failure  of  link  (qQ,q  ) 
and  by  Lemma  B.5  the  proof  is  complete. 


Case  2:  Suppose  there  is  no  i e B2  that  effects  T23[i,(ml,ml) ] . 

Let  qQ  e E2  denote  a node  such  that  d (t  -)  ■ min  d (t  -), 

^o  Sj  1eB2  * * 

and  suppose  p (t  -)  » q . Node  q cannot  effect  T23 
(definition  of  Case  2)  and  cannot  effect  T13  (violates  the  defini- 
tion of  qQ).  Thus,  q1  detects  a failure  of  link  (q^q^  and 
a REQ(ml)  is  generated. 

If  at  any  time  this  REQ(al)  enters  a node  at  82  or  S2,  then 
q.e.d.  by  Lemma  B.5.  Otherwise  the  RZQ(al)  keeps  moving  through 
nodes  at  SI  having  decreasing  d . The  REQ(ml)  cannot  be  re- 
ceived by  a node  at  S3  because  this  violates  Case  2 or  the  defini- 
tion of  qQ.  SJ,nce  for  all  i,  _>  0,  di  is  an  integral 
number  and  the  only  node  with  dt  ■ 0 is  the  SINK,  the  REQ(ml) 
arrive  at  SINK  after  a finite  number  of  steps.  Q.e.d. 
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Lemma  B.7 

If  a node  i sends  a REQ(ml)  vhile  s,  8 SI,  then  a REQ(ml) 

° o 

arrived  or  will  arrive  at  SINK  in  finite  time. 

Proof 

By  Lemma  B.4,  we  have  to  prove  only  the  ease  in  which  for  all  i, 
n.  1 ml,  and  by  Theorem  1,  the  REQ(ml)  sent  by  iQ  may  encounter  only 
nodes  having  n = ml. 

If  there  exists  a node  1 such  that  s = S3[ml],  then  q.e.d.  by 
Lemma  B .6.  Hence  we  may  assume  that  for  all  i,  s.^  S3[ml]  and  there- 
fore by  Theorem  1 the  KEQ(ml)  is  in  a tree  rooted  at  SINK.  Thus  as  in  the 
proof  of  Lemma  B .6,  the  REQ(ml)  either  arrives  at  a node  In  S2  or  S2 
(q.e.d.  by  Lemma  B . 5 ) or  travels  through  nodes  at  SI,  with  decreasing  d^ 
until  it  arrives  at  SINK,  q.e.d. 

I 
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Footnote 

1.  The  FACTS  given  in  the  algorithm  are  displayed  for  helping  in  its 
understanding  and  are  proved  in  Theorem  2. 
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3 


Table  1 - The  Basic  Algorithm 


For 


N.  ( i)  RCVD 
x 


D. 

l 


U)  d 4-  d. 

l£ 


> 


CT  «-  0 


Execute  FINITE-STATE-MACHINE 


BASIC-FINITE-STATE-MACHINE 


1 

i 

r\ 

K . 

l 

D1(4) 

N.U) 

State  SI 


i 


T12:  Condition  12  MSG(d,£ = p± ) , CT  * 0* 

Action  12  d min  D (k) 

1 k:Ni(k)=RCVD 

transmit  MBG(d^)  to  all  k s.t.  k ^ . 

State  S2 


T21 : Condition  21  Ifk,  then  N±(k)  = RCVD. 

Action  21  transmit  f'EGCd^)  to  ; 

p r k*  that  achieves  min  D (k); 

* it 

Tk  t set  N^(k)  nil; 


CT  1. 


Table  2a  - Variables  of  the  Algorithm  of  Table  3. 

Note:  It  is  assumed  that  the  network  is  composed  by  K nodes. 


'Variable  Name 

Meaning 

Domain  of  Values 

P, 

preferred  neighbor 

» • * yK 

a. 

X 

estimated  distance  from  SINK 

“»1|2»J> » • • 

du 

estimated  distance  of  link.  (i,i) 

1,2,3,... 

n. 

1 

current  counter  number 

0 fl  |2)  • • % 

nr:. 

1 

largest  number  m received  by  node  i 

0,1,2,... 

CT 

control  flag 

0,1 

N.U) 

last  number  m received  from  l after 
i completed  last  update  cycle 

nil, 0,1,2, . . . 

D.U) 

d + dJ  „ for  last  d received  from  l 
it 

°* ,1,2,  ... 

Fi(i) 

status  of  link  (i,t) 

DOWN, READY, UP 

z±  ( l) 

synchronization  number  used  by  i to 

0 12 

bring  link  (i,i)  U? 

Table  2b  - Messages  received  by  the  algorithm  of  Table  3. 


Message  Format 

Meaning 

Domain  of  Values 

MSG(m,d,t) 

updating  message  from  l 

m » 0,1,2, . . . 
d =•  oo,0,l,2,  . . . 
t = 1,2,, ,.,K 

FAILU) 

failure  detected  on  link  (i,l) 

l = 1,2 K 

WAKE(  l) 

link  (i,t)  becomes  operational 

1 

l » 1,2,. ...K 

REQ(m) 

! 

request  for  new  update  cycle  with 

n > m 

3INK 

m = 0,1,2, . . . 

i 

, 
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Table  3 - Algorithm  for  an  Arbitrary  Node  i 


1,1  For  REQ(m) 

if  j4  nil,  then  send  HEQ(m)  to  . 


1.8  For  FAIL  ( it ) 


1.2.1 


1.2.2 


F (£)  DOWN; 
CT  ■+•  0: 


1.2.3 


Execute  FINITE-STATE  MACHINE; 


1.2.4  if  p ^ nil,  then  send  REQ(n^)  to  p , 


1.3  For  MSG(m.d.Jt) 


1.3.1 


1.3.2 

1.3.3 
1.3*4 

1.3.5 

1.3.6 


if  Fi(£)  = READY,  then  F^Jt) 


(Fact1 : m > z^( fc) ) ; 


N^it)  «•  m; 

°i<‘>  - '1+dit> 

mx^  +•  max{a,mx^ } ; 

CT  0; 

Execute  FINITE-STATE  MACHINE. 


For  WAKB(it) 

(Fact  : F,  U)  = DOWN) 

wait  for  end  of  WAKE  synchronization  (see  Section  2.7); 
if  WAKE  synchronization  is  successful,  then 
z1(Jt)  maxi^  ,n£  } ; 

F (£)  +■  READY; 

N4  (l)  ail; 

if  p £ nil,  then  send  REQ(z  (&))  to  pi . 


(continued) 


- 68  - 


Table  3 

(cent ’d) 

II. 

mm.  rtate  machine 

State  SI 

^C;1S 

II. 1.1 

T12  Condition  12 

MSG(m  = mxjL,  d / »,  1 - p ),  CT  = 0 , 

II. 1.2 

Fact  12 

m > n. 

— l 

II. 1.3 

Action  12 

d.  «-  min  D (k); 

1 k:Fi(k)  = UP  1 

N^(k ) - m 

II. 1.4 

n^  m; 

II. 1.5 

Vk  s.t.  Fi(k)  = READY  if  > z.Jk),  then 

Fi(k)  ♦ UP,  tlU)  * nil; 

II. 1.6 

transmit  (n^»d^)  all  k s.t.  F^(k)  = UP 

and  k # pi; 

II. 1.7 

CT  «•  1. 

II. 2.1 

T13  Condition  13 

(MSG  (Z  = p^,d  = ”,m)  or  FAIL(  *•  = p± ) ) , CT  = 0 # 

II. 2. 2 

Fact  13 

If  MSG,  then  m >_  n±. 

II. 2. 3 

Action  13 

•; 

II. 2. h 

if  MSG,  then  «-  m; 

II. 2. 5 

Vk  s.t.  Fi(k)  « READY,  If  ^ > z^k),  then 

F (k)  UP,  N^k)  «-  nil; 

II. 2. 6 

transmit  (n^,d^)  to  all  k s.t.  F^(k)  ■ UP 

and  k ^ p^; 

II. 2. 7 

dL  ^ ail; 

11.2.8 

CT  1. 

(continued) 


Table  3 

(cont *d) 

State  S2 

II.  3. 1 

T21  Condition 

21 

Vk  s.t,  F^(k)  « UP,  then  N^(k)  *»  « mxi ; 

II. 3. 2 

“7k  s.t.  Fi(k)  = UP  and  D^k)  £ d^,; 

II. 3. 3 

if  CT  * 0,  then  MSG; 

II.  3. 4 

i *• 

II.  3. 5 

Fact  21 

d^  j*  ”,  p.^  f nil. 

11.3.6 

Action  21 

Transmit  (n^.d^)  to  p^; 

II. 3. 7 

p.  *■  k*  that  achieves  min  D.  (k)  • 

1 k:Fi(k)=UP 

11.3.8 

Vk  s.t.  F^k)  = UP.  set  N,(k)  «-  nil; 

II. 3. 9 

CT  «-  1. 

II. 4.1 

T22  Condition 

22 

MSG(m  » mxi  > d ^ *,*-  = p^,  CT  =>  0. 

II. U. 2 

Action  22 

Same  as  Action  12 , 

II. 5.1 

T22  Condition 

22 

FAIL( I t p±),  CT  * 0. 

II. 5- 2 

Action  22 

CT  «-  1, 

II. 6.1 

T23  Condition  23 

Same  as  Condition  13  * 

II. 6. 2 

Fact  23 

Same  as  Fact  13* 

11.6.3 

Action  23 

Same  as  Action  13, 

State  S3 

II. 7.1 

T32  Condition 

J2 

^k  s.t.  F^k)  - UP,mxi  = N^k)  > 

II. 7- 2 

Fact  32 

p^  “ nil,  d^  * ». 

II. 7. 3 

Action  32 

Let  k*  achieve  min  D.(k). 

k:F  (k)*UP 

A 

H1(k)"iaxi 

(continued) 


Table  3 (coat'd) 
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II. 7. k 

11. 7. 5 

11.7.6 

11. 7. 7 

11.7.8 

II. 7. 9 

11. 8.1 

11. 8. 2 

11. 9.1 

11.9.2 


Then  *-  k*; 

ni  " “V 

di-Di(k*); 

^k  s.t.  Fi(k)  » READY  9 if  > z±(k)  t then 
Fi(k)  «-  UP,  H±0c)  •*-  nil; 

transmit  (n. ,d. ) to  all  k s.t.  F.(k)  a UP 
i 1 * 

and  k ^ pi; 

CT  «-  1. 

State  S2 

T22  Condition  22  MSG(m  = mx^  > nitd  ^ m,2-  = Pj^),  CT  » 0. 
Action  22  Same  as  Action  12 

T23  Condition  23  Same  as  Condition  13 
Fact  23  Same  as  Fact  13 

Action  23. 


II. 9- 3 


Same  as  Action  13 
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Table  U 

The  Algorithm  for  the  SIM 

For  REQ(m) 

CT  ■*-  0 ; 

execute  FINITE-STATE-MACHINE • 

For  FAIL(t) 

F (Jt)  DOWN  ; 

CT  ♦ 0 ; 

execute  FINITE-STATE-MACHINE  . 

For  MSG(m.d.t) 

( JL ) m ; 

CT  0 ; 

execute  FINITE-STATE-MACHINE  • 

For  WAKE{ &) 

(Fact:  F±( £.)  » DOWN) 
wait  for  end  of  WAKE  synchronization; 
if  WAKE  synchronization  is  successful,  then 
Fi(t)  «-  READY; 

CT  0 ; 

execute  FINITE-STATE-MACHINE. 

For  START 

CT  0 ; 

execute  FINITE-STATE-MACHINE. 


i 


(continued) 
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Table  4 (cont’d) 

FINITE-STATE  MACHINE  FOR  SINK 


(CT  - 0)  and  (REQ(m  = ngINK)  or  FAIL  or  WAKE  or  START) 

if  (REQ  or  FAIL  or  WAKE)^then  ngIIjK  nSIKK  + 1; 

7k  s.t.  Fi(k)  = READY^ then  F.(k)  ♦ UP,  N±(k)  nil; 
transmit  to  ^ s.t.  F^(k)  = UP; 

CT  «-  1. 


State  SI 

T12  Condition  12 
Action  12 


State  S2 

T21  Condition  21  7 k s.t.  ? (k)  - UP,  then  N^k)  * nSINKl 

MSG  or  START. 

Action  21  pc  s.t.  F^(k)  * UP,  then  (k)  NIL; 

CT  ♦ 1. 

T22  Condition  22  (CT  = 0)  and  (REQ(m  = nSINK;)  or  FAIL  or  WAKE) 


Action  22 


Same  as  Action  12. 
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